[Linux-ha-dev] two race conditions in heartbeat
yocum at linuxcare.com
Thu Sep 28 08:11:10 MDT 2000
I think that both of these problems can be solved by making heartbeat
block (or, at least the ResourceManager part) while each resource is
being acquired/released, and, like Juri said, then send the message to
the secondary node when everything is done.
Yes, this is making heartbeat more into a resource manager, but I'm not
sure how else to do it.
Juri Haberland wrote:
> Well, as the subject says, I found two (kind of) race conditions in
> heartbeat. The first can happen when you have multiple services for one
> node in your haresource file like the following:
> node1 10.0.5.184 service1
> node1 10.0.5.185 service2
> Sometimes heartbeat starts the IPaddr script so shortly after another
> that both assume that a certain alias (e.g. eth0:1) is unused and
> therefor are using this alias resulting in one overwriting the other!
> One idea that I had to prevent the scripts from doing so is to let them
> use a lock file. If this lock file exists the script should wait for a
> certain amount of time and look for the lock file again. After a number
> of tries it should exit with a non zero value to indicate that it
> couldn't obtain the lock.
> Other suggestions are welcome.
> So the other race condition is in the handling of a failure when
> heartbeat is stopped on the active node. In this situation the stopping
> heartbeat seems to signal the other node that it is shutting down so the
> inactive node can take over immediately. The problem here is that
> heartbeat signals that it is stopping _before_ it has successfully shut
> down all services which can lead to a situation where the (old) active
> node hasn't yet released all resources and the (new) active node has
> already taken the resources. This is especially bad with shared storage
> and can cause serious file system corruption!
> The solution would be to let heartbeat release all it's resources and
> _then_ signal the other node that it is shutting down.
Dan Yocum, Sr. Linux Consultant
yocum at linuxcare.com, http://www.linuxcare.com
Linuxcare. Support for the revolution.
More information about the Linux-HA-Dev