[LinuxFailSafe] Linux Failsafe Problem on Poweroff

Kashif Shaikh kshaikh@consensys.com
31 Dec 2002 10:22:23 -0500


On Thu, 2002-12-19 at 07:51, Br=?iso-8859-15?q?=E1ulio=20Gergull?=
wrote:
> The problem happens when I simulate a poweroff condition. It seems that the 
> surviving node detects that the other node went down, however the resource 
> group is not tranfered. According to the logs it seems that the database gets 
> locked and does not accept administrative commands and the broken node still 
> appears with state "UP" in the GUI.
How did you simulate a poweroff condition?  Did you just yank out the
network cables?  Also what kind of reset network are you using? You're
not using ssh(you shouldn't be)? You machine configuration would be
helpful.

> Attached you'll find an excerpt of the log files.

>From your log files, the Tie breaker(db1) failed to reset the db2 node,
and could not resolve tie membership.  In other words, failsafe doesn't
know if db2 is really down or if this is a cluster partition. IF reset
was a success, RGs on db2 would have moved to db1.

In the interest of avoiding data corruption, RGs on db2(if any) are NOT
failed over. You should read the FailSafe sysadmin manual and get your
reset net properly configured/operational.  Look at the source for cmsd
in failsafe for more info.

Kashif