[Linux-HA] uname -n configuration requirement

Guochun Shi gshi at ncsa.uiuc.edu
Tue Mar 1 10:28:18 MST 2005


At 02:30 PM 3/1/2005 +0100, you wrote:
>On 2005-02-28T11:53:13, Guochun Shi <gshi at ncsa.uiuc.edu> wrote:
>
>> >Well, we can of course always just initialize is to 32, 64 or
>> >whatever, or does it cost more than just a couple of bytes? And it
>> >might also be possible to increase it at run time.
>> Yes, I believe we can make it work easily for this part. More
>> challenge is how to compute quorum.
>
>CCM needs to exchange information about nodes which have been added or
>removed as part of the concensus computation, too. Only if all nodes
>agree on the list of candidates they can proceed.
>
>So, they need to exchange a checksum of said list first (because in 99%
>of all cases, the lists will have been exchanged already and be synced).
>
>If this checksum does NOT match, each node must enumerate the nodes it
>knows about to the transition leader. Alas, all of them, including the
>removed ones. The transition leader than needs to merge them all, and
>rebroadcasts to all cluster members.
>
>(This can probably be optimized by the transition leader, or whatever
>the CCM calls it again, only requesting each list associated with each
>checksum once.)

Sounds good to me.

>Of course, this algorithm is not perfect; if the admin removes / adds
>nodes a cluster split, the quorum computation can be wrong, because the
>other partitions can't know. This however is an unsolvable problem and
>the admin ought to be whacked for it.
>
>We _could_ probably add some safeguards against this in the management
>tool, by suggesting the node list shouldn't be changed if not all nodes
>are up and joined.

I think requiring > half of all node should be enough. We shall not use other small non-quorum partition
any way. Requiring all nodes will prevent a cluster from functioning if only one node goes down. 

-Guochun




More information about the Linux-HA mailing list