[Linux-ha-dev] Re: The issue of shutdown
lmb at suse.de
Wed Dec 1 07:02:09 MST 2004
On 2004-12-01T06:56:00, Alan Robertson <alanr at unix.sh> wrote:
(Shortening the Cc list as all guilty parties are subscribed to the
mailing list ;-)
> >The situation is to shutdown the last node in the cluster.
> >1. heartbeat -k
> >2. heartbeat sends SIGTERM to CRM process group
> >3. CRM sends several stop operations to LRM to release all the resources
> >it holding and exit immediatly.
> OK. Then the CRM is broken. It should wait for all resources to be
> released before it exits. This is because it MUST take some kind of action
> if a resource fails to stop correctly. It is not enough to pray for
> correct stop status ;-).
Agreed, this bug is very important.
The last node is in a somewhat handicapped position, it likely can't
rely on another node to fence it, so it must either block or commit
suicide via a "reboot -nf".
"stop" should not fail, but it just might, and that would be an ugly
At the same time, the lrmd should continue to execute the 'stop'
operations which have already been submitted, and not silently abort
(Or generate 'fake' stop events for all remaining resources if it itself
is asked to shutdown with resources still active, and wait until they
have completed / or commit suicide if they fail too.)
This should be done as a safety net in case the CRM fails, and I'd
consider this a bug in the LRM. However, I'm quite glad it was there,
because otherwise we might have missed the CRM bug and now we can fix
Lars Marowsky-Brée <lmb at suse.de>
High Availability & Clustering
SUSE Labs, Research and Development
SUSE LINUX Products GmbH - A Novell Business
More information about the Linux-HA-Dev