AW: [LinuxFailSafe] startup of resource groups with one node down

Kashif Shaikh kshaikh@consensys.com
Tue, 07 May 2002 17:24:28 +0000


Padmanabhan Sreenivasan wrote:

>Martin Bene wrote:
>
>>>Von: Padmanabhan Sreenivasan [mailto:paddy@sgi.com], 07. Mai 2002 01:20
>>>
>>>FailSafe notion of tie-breaker is different. Tiebreaker node
>>>gets the first chance to reset other node in a two node cluster
>>>in case of network partition.
>>>If only one node in the cluster is operational, start HA services only on that
>>>node. When the other node is available, you can start HA services on that node. It
>>>should rejoin the cluster.
>>>
>>Thanks for that hint. I didn't realize the results of starting ha services for the cluster or starting it for just one node are quite different if only one node is available at startup:
>>
>>start ha services for cluster:
>>        * membership only comes up if tiebreaker node is available,
>>
>
>this is not true. membership can form even if the tiebreaker node is not available. In
>a 2 node cluster with network partition, if the tiebreaker node is not able to
>reset the non-tiebreaker node and non-tiebreaker node can successfully reset
>the tiebreaker node, a membership of one node (non-tiebreaker node)
>can be formed.
>
Paddy in a previous email you stated that if a network partition were to 
form, the tiebreaker node will get first chance to reset the other 
node(in 2-node config).  So if the tiebreaker node failed(i.e. say 
during network partition), will the other node "timeout" waiting for a 
reset signal from the tiebreaker node?

If this is true, what will happen in the case of 4 nodes...with network 
partition splitting the cluster into 2 groups of 2 nodes:


N1-LEFT(UP)      :     N3-RIGHT(UP)
                 :  
N2-LEFT(DN)(TIE) :     N4-RIGHT(UP)

Since tiebreaker N2 couldn't send a reset signal to N3 & N4, does this 
mean that N3&N4 will timeout waiting for reset signal and instead reset 
nodes N1 and N2?

Now if this is true, all we have left are nodes N3 & N4(part of new 
membership).  There is no tiebreaker node anymore.  Suppose, again we 
have a network partition between N3 and N4.  N3 thinks N4 is down and N4 
thinks N3 is down.  Who resets who in such a case...I've read the idea 
of the "lowest node id" in the LFS admin guide, but that is only used 
when a tie-breaker node is not set(see section 8.).

Thanks in advance for any info on this,

Kashif


>
>Paddy
>