[Linux-HA] Crazy because of SplitBrain!
Dominik Klein
dk at in-telegence.net
Wed Jul 30 05:33:49 MDT 2008
ZiLioN ZilLioN wrote:
>
>
>> Date: Wed, 30 Jul 2008 08:15:04 +0200
>> From: dk at in-telegence.net
>> To: linux-ha at lists.linux-ha.org
>> Subject: Re: [Linux-HA] Crazy because of SplitBrain!
>>
>>> When the node A STONITH to node B, the node B is rebooted. When the node B has be rebooted, node B do not start the resources again?
>> Don't start heartbeat at boot time. I don't know if that's the suggested
>> method, but that's the way I do it and that certainly works.
>
> Ok, this method certainly will work
>
>>> I can to specify with "contraints" that the node with less score is the node to kill, the node will die?
>> When the nodes don't see each other, they each only compute scores for
>> themselves.
>
> To end the topic:
>
> Suppose a scenario (in the same net) where node A has connectivity to Internet and node B no. Then only node A can offer the service.
>
> If node A STONITH node B and node B STONITH node A...
>
> If in this moment the nodes lose the communication between them, the node B STONIH to node A sucesfully. The node A has died and the service can´t offer to Internet.
>
> Disaster!.
Well how likely is that?
> This is the problem if you can´t decide what node should die :(
> It´s not possible that: the node A has the resource STONITH started because he has communication (goal shoot the node B) with Internet and node B has the resource STONITH stopped because he hasn´t communication with Internet. When both lose the communication between them, only the node A will shoot to the node B.
That _is_ possible. Look into resource location constraints and pingd.
Prevent the stonith resource to run on a node with no gateway ping
connectivity.
> Important question:
>
> Can the fencing method STONITH uses in a scenario distributed geographically? if both nodes lose the communication, How they use STONITH?
Assuming the nodes are geographically separated and communicate over the
internet (which isn't exactly a great idea). This implies the stonith
devices being used over the internet, too. If this communication breaks,
nodeA cannot execute STONITH for nodeB and vice versa. Since nodes
cannot shoot themselves, the STONITH commands would be queued until they
can be executed. So I guess when communication comes back, the first
node to shoot the other one wins.
Please correct me if I'm wrong here.
Regards
Dominik
More information about the Linux-HA
mailing list