AW: [LinuxFailSafe] startup of resource groups with one node
down
Kashif Shaikh
kshaikh@consensys.com
Tue, 07 May 2002 17:24:28 +0000
Padmanabhan Sreenivasan wrote:
>Martin Bene wrote:
>
>>>Von: Padmanabhan Sreenivasan [mailto:paddy@sgi.com], 07. Mai 2002 01:20
>>>
>>>FailSafe notion of tie-breaker is different. Tiebreaker node
>>>gets the first chance to reset other node in a two node cluster
>>>in case of network partition.
>>>If only one node in the cluster is operational, start HA services only on that
>>>node. When the other node is available, you can start HA services on that node. It
>>>should rejoin the cluster.
>>>
>>Thanks for that hint. I didn't realize the results of starting ha services for the cluster or starting it for just one node are quite different if only one node is available at startup:
>>
>>start ha services for cluster:
>> * membership only comes up if tiebreaker node is available,
>>
>
>this is not true. membership can form even if the tiebreaker node is not available. In
>a 2 node cluster with network partition, if the tiebreaker node is not able to
>reset the non-tiebreaker node and non-tiebreaker node can successfully reset
>the tiebreaker node, a membership of one node (non-tiebreaker node)
>can be formed.
>
Paddy in a previous email you stated that if a network partition were to
form, the tiebreaker node will get first chance to reset the other
node(in 2-node config). So if the tiebreaker node failed(i.e. say
during network partition), will the other node "timeout" waiting for a
reset signal from the tiebreaker node?
If this is true, what will happen in the case of 4 nodes...with network
partition splitting the cluster into 2 groups of 2 nodes:
N1-LEFT(UP) : N3-RIGHT(UP)
:
N2-LEFT(DN)(TIE) : N4-RIGHT(UP)
Since tiebreaker N2 couldn't send a reset signal to N3 & N4, does this
mean that N3&N4 will timeout waiting for reset signal and instead reset
nodes N1 and N2?
Now if this is true, all we have left are nodes N3 & N4(part of new
membership). There is no tiebreaker node anymore. Suppose, again we
have a network partition between N3 and N4. N3 thinks N4 is down and N4
thinks N3 is down. Who resets who in such a case...I've read the idea
of the "lowest node id" in the LFS admin guide, but that is only used
when a tie-breaker node is not set(see section 8.).
Thanks in advance for any info on this,
Kashif
>
>Paddy
>