[LinuxFailSafe] FailSafe does not failover resource group

Kashif Shaikh kshaikh@consensys.com
25 Feb 2003 11:02:54 -0500


On Tue, 2003-02-25 at 02:29, Erehwin Ureta wrote:
> Need help.
> 
> I have managed to install FailSafe 1.0.4 in Redhat 7.2
> I configured two nodes with IP and Apache resources. They are part of one 
> resource group. I am testing the failover process by switching off the 
> machine that initially hosts the resource group but it does not seem to 
> failover to the other node. I can, however, do an "admin move" of the group. 
> "haStatus -a" reports that everything is working fine.

> 
> The only thing I could point to is that I am not using a system controller. 
> I don't think it is the cause though. From what I understand(and from the 
> list archives), its use is to reset the other node to protect data in a 
> shared storage configuration. I don't use any shared storage so I do not 
> think I should use sysctl.

After you turn off the machine, failsafe will have a tied
membership(i.e. exactly half nodes are UP). If it can't resolve the tied
membership through resetting the other node, the alive node will go into
a 'lonely' state and any resource groups that were on the downed machine
will *not* failover.

Regardless if you have shared storage or not, the reset functionality
ensures that downed node has really 'released' all of its resources.  If
you had a cluster partition, and didn't want to use reset, you could
have two nodes with the same IP address potentially allocated.

But if you must, you can fake reset functionality by using the NULL
stonith device -- just be wary of the effects of a cluster partition.

Kashif