[Linux-ha-dev] tracking resource groups in heartbeat
Wed, 29 Mar 2000 11:08:38 -0800
On Wed, Mar 29, 2000 at 06:46:24AM -0700, Alan Robertson wrote:
> horms wrote:
> > On Tue, Mar 28, 2000 at 09:35:08PM -0700, Alan Robertson wrote:
> > > The conversation below is taken from email off the list. It seemed
> > > generally interesting though...
> > >
> > > Horms wrote:
> > > > ... I have however noticed that as heartbeat keeps state of nodes, and
> > > > not resource allocations it is possible to get into a state where no
> > > > nodes/more than one node have a resource. In particular if there is a
> > > > communication medium failure, or if heartbeat is started up on more
> > > > than one node simultaneously. I have been thinking of some fairly
> > > > simple mechanisms to resolve this, vis a vis nodes requesting ownership
> > > > of a resource. I am wondering what your thoughts are. I am most
> > > > concerned about the (simple) two-node case, though something that
> > > > extends beyond that would be nice.
> > >
> > > The folks from Conectiva are doing something in a related area. In the
> > > current code, the assumption is that if the master for a resource is up,
> > > it has control of the resources it is listed as master for. They break
> > > that assumption with a new feature (nice_failover?). It would be good to
> > > add your thoughts and observations to that, and think about the right way
> > > of thinking about this stuff. Once one has the right mental model, the
> > > code is easy :-)
> > It seems to me that the existing code will take control of a resource if
> > the master specified in haresources fails, but not necessarily give it up
> > when the master comes back up again.
> Without nice_failback (which isn't in the current code), this should not
> happen. When the master comes back up, it asks for the other node to
> give it's resources, and in the case of no response takes them anyway.
> > Again, in the case of a media failure,
> > or nodes coming up at the same time both nodes may take ownership of a
> > resource and neither will give it up.
> I agree in the case of media failure. Have you observed it in the case
> of both nodes coming up at the same time?
I thought I had, but I can't reproduce it using the latest code.
> The current bringup sequence is: Start your own heartbeat. Wait until
> you've heard someone else's heartbeat or about 10 seconds. Begin the
> resource takeover sequence for those resources you master. If you've
> heard someone else's heartbeat, then communications with the other end
> are working. I think the problem right now is that there is no database
> indicating the state of either resources or nodes. Everything depends
> on the resource scripts to indicate resource status.
> Here's what I think the race condition might be: Node A is master and
> is down. Node B is also down, but is the slave. Node B comes up, and
> just about the time that it times out on Node A being down, Node A
> begins to come up. Node B times out on the resources A is primary on,
> and begins the process of taking them over. Node A comes up, and seeing
> B's heartbeat, immediately requests it's resources. Node B has started
> the takeover scripts, but they aren't done, so it thinks it doesn't own
> them, so it doesn't give them up. Node A then takes them over, while
> Node B's scripts are in the process of doing the same.
It should be easy enough to resolve this by a node having tighter
control over its resources. It takover is commenced then it has
the resource. Perhaps there needs to be a state for resorce
takover in process or giveup in process which is somewhere between
having a resource and not having a resource.
> > I have attached a patch that I believe will fix this problem. If
> > nice_failover is in operation then this patch will cause both nodes to drop
> > the resource, which is bad, but they would both keep it otherwise so it is
> > problematic in either case. Also if a resource has more than one master -
> > then this patch results in resources being dropped by all nodes or no nodes,
> > depending on your haresources file. This isn't very good either but if a
> > resource has a master and a slave then it works.
> My guess is that we need to design a "good" bringup algorithm that has
> the right kinds of sequencing and status changes such that it doesn't
> have any race conditions. This is moderately complex, but is probably
> the better approach. I started to write one here, but found it too hard
> to write inline in email.