no local heartbeat?..
alanr at unix.sh
Mon Feb 12 20:53:59 MST 2001
Juri Haberland wrote:
> Horvath Andras wrote:
> > Hi,
> > I'm new to this list so hello to you all. :-)
> > I use heartbeat for a 2-node cluster, connected via 2 serial and 1
> > dedicated ethernet links. All seems to work well, except that I
> > recently got this into the logfile (and heartbeat died, the other node
> > took the services):
> > Feb 2 16:45:52 bingyom heartbeat: WARN: node bingyom: is dead
> > Feb 2 16:45:54 bingyom heartbeat: ERROR: No local heartbeat.
> > Forcing shutdown.
> > Feb 2 16:45:55 bingyom heartbeat: info: Heartbeat shutdown in
> > progress.
> > When does a node pronounce _itself_ dead?
> > The machine works well, running for weeks now, w/ kernel 2.2.18pre23,
> > heartbeat 0.4.8-1.
> > When I restart heartbeat, it takes over its own resources and all is
> > back to normal..
> > I suppose it must be me :), but how?
> Not necessarily ;-)
> If you have a very short deadtime of e.g. 5sec. and you load a scsi or
> fc driver module which starts initializing scsi/fc subsystem, the
> complete machine is blocked completely for more than 5 seconds. In this
> situation it is likely to happen that your heartbeat cannot "hear"
> This can also happen under very high load.
> Try to increase the deadtime and decrease the keepalive time.
> 10/1 is a good start.
What he says is exactly right. If you're running a newer version of the
code you can also set the deadtime high and then set warntime to whatever
you think you want your heartbeat failover time to be, and then learn how
late your heartbeats come in (due to system load, etc.)
-- Alan Robertson
alanr at suse.com
More information about the Linux-HA