[Linux-HA] heartbeat/ipfail wrongly claims ping_group to be d ead
horms at verge.net.au
Tue Jun 15 19:29:05 MDT 2004
On Tue, Jun 15, 2004 at 08:26:53AM -0600, Alan Robertson wrote:
> Alan Robertson wrote:
> >Mueller Armin ICM IT SAS IFR 2 wrote:
> >>Hi Alan,
> >>the problem I described a few days ago seems to be reproduceable.
> >>ping_group is dead exactly 3 days, 19 h and 8 min after starting
> >>heartbeat. Do you have any ideas to track this down?
> >>Now I'm leaving out the deadping parameter. Perhaps this makes
> >>the trouble...
> >Probably not, I'm afraid :-(
> >If I did my calculation correctly, the number of ticks is:
> >if hz is 100 on your machine.
> >and the number of ping packets is:
> >Maybe this number of packets in ping_group is somehow limited to
> >65536... Hmmm...
> OK... The code doesn't allow for sequence number wraparound.
> The data about exactly how long it went was key to figuring this out.
> I don't even know why this code is here, since it isn't in the other ping
> code... I think horms was the author...
> Horms?? What's up here?
>From memory, it uses the sequence numbers to keep track of which nodes
in the ping group have failed to respond. I will fix it up to allow for
More information about the Linux-HA