MySQL Group Replication: about ack from majority
Mysql 15-May-2017

MySQL Group Replication: about ack from majority

The documentation states that “For a transaction to commit, the majority of the group have to agree on the order of a given transaction in the global sequence of transactions.

This means that as soon as the majority of nodes member of the group ack the writeset reception, certification can start. So, as a picture is worth a 1000 words, this is what it looks like if we take the illustrations from my previous post:

a group of 3 members

zoom in transaction deliver

the writer also acks

majority is reached, the system agreed on the order

ack of the remaining node will come too but the order has been already decided

 

certification can start

the process then continues as usual

So theoretically, having 2 nodes in one DC and 1 node in another DC shouldn’t be affected by the latency between both sites if writes are happening on the DC with the majority of nodes. But this is not the case.

As you can see on the video above, every 3rd write to the system is affected by the latency. That’s because the system has to wait for the noop (single skip message) from the “distant” node. As Alfranio explained it in his blog post about our homegrown paxos based consensus, XCom is a multi-leader or more precisely a multi-proposer solution. In this protocol, every member has an associated unique number and a reserved slot in the stream of totally ordered messages. There is no leader election and each member is a leader of its own slots in the stream of messages. Members can propose messages for their slots without having to wait for other members. But if they don’t have anything to say, they need to tell it too and this is where we are affected by the latency.

So in conclusion, having a distant node (or with higher latency) as member of a Group slows down the full workload. Not constantly but at least in correlation with its ratio on the cluster. 1/3 for a 3 nodes cluster for example.

At least for now, if you need to have a distant node and using it only for reads, I would advice to use asynchronous replication between your two sites or being prepare to pay the cost of the latency.