[Linux-ha-dev] Ask for suggestion of quorum server
zhenhltc at cn.ibm.com
Fri Sep 29 07:06:26 MDT 2006
Some answers based on my understanding here.
First of all, there is only one leader in each sub-cluster (or partition).
Only the leader could connect to the quorum server.
The leader broadcasts the quorum status to all other nodes in its partition.
Lars Marowsky-Bree wrote:
> On reading the document:
> - I don't understand what "takeover" or "giveup" mean.
Let's image that we have a cluster which has splited to two partitions, A and B.
and let's say that node a is the leader node of A and node b is the leader node of B.
Both a and b connect to the quorum server.
The quorum server tells node a that A has quorum and tells node b that B hasn't quorum.
Now something happens in the partition A, the leader will change from node a to node a'.
So node a has to disconnect from the quorum server, but we know node a' will connect to the
quorum server soon. The "takeover" is the time that the leader takeovers from a to a'.
If a new node adds to B so the weight of B is larger than the weight of A,
the quorum will transfer to B. However, after we tell A that you don't have quorum
anymore, we need wait some time to let A "giveup" all the resoures A is holding.
"giveup" is the time to "giveup" all resources when a partition lost quorum.
> - "nodenum" seems to be a static number; how is this supposed to work
> with autojoin?
The current code calculates the quorum based on the comparison of the weight of
the partitions. So we don't need "nodenum" and "weight" of cluster.
They are there for that we may implement other algorithm later.
> - What does "weight" mean? And how does this weight relate to the
> per-node weight configured further below?
The summary of the weight of all nodes in a cluster/partition is the weight of
> - Why do we have to set an environment variable to activate a hb
It's a little strange for me too.
The orignal implementation of quorum plugin choosing is like this.
We can move it to ha.cf or some where else.
> - What happens when a site disconnects?
The disconnected partition will lose the quorum.
The remain partition will get the quorum.
> - What happens in case of a N:N split, but both sides still connected?
Only one of them ( the earlier one in fact) will get the quorum.
> - How does recovery occur when a partition which was lost reconnects, but
> doesn't gain connectivity to the other side?
If the returned partition has larger weight than the remain partition, the quorum will be
transfered. otherwise, the returned partition won't get the quorum from the quorum server.
> - What if both sides lose connectivity to the quorum server?
If the quorumd is only way to determine the quorum, then none of sides would get quorum.
However, we can combine the other quorum plugin(s) with quorumd.
Then the quorumd would be the last tie-breaker.
> Besides, in general I'm thinking that doing the external quorum server
> is a misjudgement of priorities, but, as it wasn't my call nor yours, I
> won't elaborate on that ;-) (And, with the features in the PE, I've my
> own featuritis to answer for, so I'm afraid I can't throw the first
> stone ... but maybe the second one! ;-)
Linux-HA, Linux Technology Center, China Systems & Technology Lab
China Development Labs, Beijing Tel: 86-10-82782244 Ext. 2845
Email: zhenhltc at cn.ibm.com
More information about the Linux-HA-Dev