[Linux-ha-dev] Re: "failover is too fast"

Alan Robertson alanr@suse.com
Tue, 14 Nov 2000 09:59:00 -0700


Dan Yocum wrote:
> 
> Alan,
> 
> It looks like a controlled failover (i.e., 'heartbeat stop') is mostly
> working in 0.4.8g - the second node doesn't start taking over the
> services until the last service is told to stop.  However, what I'm
> observing is that hb issues the '<service> stop' command, and
> immediately dies, itself, without waiting/verifying that the last
> service has actually died, or do I have this thing

Here's what I get with a "sleep" resource that sleeps 45 seconds before
finishing:

Nov 14 10:50:07 sgi1 heartbeat[460]: info: Heartbeat shutdown in progress.
Nov 14 10:50:07 sgi1 heartbeat[587]: info: Giving up all HA resources.
Nov 14 10:50:07 sgi1 heartbeat: info: Releasing resource group: sgi1 Sleep
Nov 14 10:50:07 sgi1 heartbeat: info: Running /etc/ha.d/resource.d/Sleep 
stop
Nov 14 10:50:07 sgi1 heartbeat: debug: Starting /etc/ha.d/resource.d/Sleep 
stop
Nov 14 10:50:07 sgi1 heartbeat: info: /etc/ha.d/resource.d/Sleep: Shutting
down
            {and it sleeps for 45 seconds with no effects on sgi2}
Nov 14 10:50:52 sgi1 heartbeat: info: /etc/ha.d/resource.d/Sleep: Shutdown
complete.
Nov 14 10:50:52 sgi1 heartbeat: debug: /etc/ha.d/resource.d/Sleep  stop
done. RC=0
Nov 14 10:50:52 sgi1 heartbeat[587]: info: All HA resources relinquished.
Nov 14 10:50:52 sgi2 heartbeat: info: Running /etc/ha.d/rc.d/shutdone
shutdone
            {this message mostly ignored by sgi2 - it marks sgi1 as in
transition}
Nov 14 10:50:53 sgi1 heartbeat[460]: info: Heartbeat shutdown complete.

Nov 14 10:50:56 sgi2 heartbeat[813]: WARN: node sgi1: is dead 
          {only now does sgi2 notice anything about sgi1}

So, it looks like it works to me...

	-- Alan Robertson
	   alanr@suse.com