[Linux-ha-dev] tracking resource groups in heartbeat

Horms horms@vergenet.net
Wed, 29 Mar 2000 11:08:38 -0800


On Wed, Mar 29, 2000 at 06:46:24AM -0700, Alan Robertson wrote:
> horms wrote:
> > 
> > On Tue, Mar 28, 2000 at 09:35:08PM -0700, Alan Robertson wrote:
> > > The conversation below is taken from email off the list.  It seemed
> > > generally interesting though...
> > >
> > > Horms wrote:
> > > > ... I have however noticed that as heartbeat keeps state of nodes, and
> > > > not resource allocations it is possible to get into a state where no
> > > > nodes/more than one node have a resource. In particular if there is a
> > > > communication medium failure, or if heartbeat is started up on more
> > > > than one node simultaneously. I have been thinking of some fairly
> > > > simple mechanisms to resolve this, vis a vis nodes requesting ownership
> > > > of a resource. I am wondering what your thoughts are. I am most
> > > > concerned about the (simple) two-node case, though something that
> > > > extends beyond that would be nice.
> > >
> > > The folks from Conectiva are doing something in a related area. In the
> > > current code, the assumption is that if the master for a resource is up,
> > > it has control of the resources it is listed as master for.  They break
> > > that assumption with a new feature (nice_failover?).  It would be good to
> > > add your thoughts and observations to that, and think about the right way
> > > of thinking about this stuff.  Once one has the right mental model, the
> > > code is easy :-)
> > 
> > It seems to me that the existing code will take control of a resource if
> > the master specified in haresources fails, but not necessarily give it up
> > when the master comes back up again.
> 
> Without nice_failback (which isn't in the current code), this should not
> happen.  When the master comes back up, it asks for the other node to
> give it's resources, and in the case of no response takes them anyway.

> > Again, in the case of a media failure,
> > or nodes coming up at the same time both nodes may take ownership of a
> > resource and neither will give it up.
> 
> I agree in the case of media failure.  Have you observed it in the case
> of both nodes coming up at the same time?

I thought I had, but I can't reproduce it using the latest code.

> The current bringup sequence is:  Start your own heartbeat.  Wait until
> you've heard someone else's heartbeat or about 10 seconds.  Begin the
> resource takeover sequence for those resources you master.  If you've
> heard someone else's heartbeat, then communications with the other end
> are working.  I think the problem right now is that there is no database
> indicating the state of either resources or nodes.  Everything depends
> on the resource scripts to indicate resource status.

Agreed.

> Here's what I think the race condition might be:  Node A is master and
> is down.  Node B is also down, but is the slave.  Node B comes up, and
> just about the time that it times out on Node A being down, Node A
> begins to come up.  Node B times out on the resources A is primary on,
> and begins the process of taking them over.  Node A comes up, and seeing
> B's heartbeat, immediately requests it's resources.  Node B has started
> the takeover scripts, but they aren't done, so it thinks it doesn't own
> them, so it doesn't give them up.  Node A then takes them over, while
> Node B's scripts are in the process of doing the same.

It should be easy enough to resolve this by a node having tighter
control over its resources. It takover is commenced then it has
the resource. Perhaps there needs to be a state for resorce
takover in process or giveup in process which is somewhere between
having a resource and not having a resource.

> > I have attached a patch that I believe will fix this problem.  If
> > nice_failover is in operation then this patch will cause both nodes to drop
> > the resource, which is bad, but they would both keep it otherwise so it is
> > problematic in either case. Also if a resource has more than one master -
> > then this patch results in resources being dropped by all nodes or no nodes,
> > depending on your haresources file. This isn't very good either but if a
> > resource has a master and a slave then it works.
> 
> My guess is that we need to design a "good" bringup algorithm that has
> the right kinds of sequencing and status changes such that it doesn't
> have any race conditions.  This is moderately complex, but is probably
> the better approach.  I started to write one here, but found it too hard
> to write inline in email.

-- 
Horms