Heartbeat failure
David Lang
david.lang@digitalinsight.com
Thu, 24 Oct 2002 10:58:59 -0700 (PDT)
Ok, after reading the rest of the thread it sounds like what happens is
1 operating normally
2 problem causes split-brain
3 split brin detected
4 both sides tell the other that the split-brain happened
5 both sides restart
with step 4 being the new change??
David Lang
On Thu, 24 Oct 2002, David Lang wrote:
> Date: Thu, 24 Oct 2002 10:50:50 -0700 (PDT)
> From: David Lang <david.lang@digitalinsight.com>
> To: Alan Robertson <alanr@unix.sh>
> Cc: Alex Kramarov <alex@incredimail.com>, linux-ha@muc.de
> Subject: Re: Heartbeat failure
>
> so does this mean that if I restart heartbeat on the backup some time
> later heartbeat on the primary will restart as well?
>
> David Lang
>
> On Thu, 24 Oct 2002, Alan Robertson wrote:
>
> > Date: Thu, 24 Oct 2002 11:09:12 -0600
> > From: Alan Robertson <alanr@unix.sh>
> > To: David Lang <dlang@diginsite.com>
> > Cc: Alex Kramarov <alex@incredimail.com>, linux-ha@muc.de
> > Subject: Re: Heartbeat failure
> >
> > David Lang wrote:
> > > Alan,
> > > you mentioned when I outlined things a couple days ago that the active
> > > machine would be able to tell that heartbeat on the other machine has
> > > restarted due to the change in generation and sequence numbers.
> > >
> > > That may be true, but the active machine won't do anythign if it detects
> > > that the backup has restarted in this manner becouse it can't tell if the
> > > backup restarted becouse it got into a split-brain condition or becouse
> > > the backup was rebooted (and if the backup was just rebooted the active
> > > box shouldn't do anything)
> >
> >
> > I put a fix in some time ago where it waits a while before restarting the
> > cluster when it finds this condition. This makes sure BOTH SIDES see the
> > condition BEFORE the node restarts. Then BOTH sides will restart and
> > everything is handled just like it should be.
> >
> > When this condition is discovered, the trick is to make the other side also
> > realize it before you shut down. Sending them a couple of packets does that
> > quite nicely.
> >
> > -- Alan Robertson
> > alanr@unix.sh
> >
> >
>