[Linux-ha-dev] heartbeat-1.2.2 failure to stop
tis at foobar.fi
Mon Jun 7 03:30:59 MDT 2004
I have had some problems with my ha firewall clusters because of
heartbeat refusing to stop. What I have found out so far is.
After running some days (some hours is not enough) /etc/init.d/heartbeat
stop will not stop heartbeat.
heartbeat -k command run by initscript will be running forever being
unable to stop heartbeat.
Following can be found from ha.log:
Jun 1 13:32:26 sgw1 heartbeat: info: Heartbeat shutdown in
Jun 1 13:32:26 sgw1 heartbeat: info: Giving up all HA resources.
Jun 1 13:32:43 sgw1 heartbeat: info: All HA resources relinquished.
Jun 1 13:32:43 sgw1 heartbeat: info: MSG: Dumping message with 2
Jun 1 13:32:43 sgw1 heartbeat: info: MSG : [t=shutdone]
Jun 1 13:32:43 sgw1 heartbeat: info: MSG : [st=dead]
And from ha_debug.log I could find that all resources went down
successfully. And after last resource following line:
Jun 1 13:32:43 sgw1 heartbeat: ERROR: Cannot write message to
/var/lib/heartbeat/fifo [16174 vs 15698]: No such device or address
And that's it. heartbeat master process will sit there untill I kill it.
If stopped node is active node, it's even worse: if node was active when
I start to take it down it will take all resources down and it will
_not_ inform passive node and it will _not_ stop answering heartbeats.
So all resources will go down and nobody will take them.
I attached strace of failed heartbeat master process.
I tested BasicSanityCheck and it doesn't find any failures from my
system. All tests work.
Tuomo Soini <tis at foobar.fi>
Linux and network services
+358 40 5240030
Foobar Oy <http://foobar.fi/>
-------------- next part --------------
A non-text attachment was scrubbed...
Size: 4080 bytes
Desc: not available
More information about the Linux-HA-Dev