[Linux-ha-dev] Re: The issue of shutdown
Alan Robertson
alanr at unix.sh
Wed Dec 1 06:56:00 MST 2004
Huang Zhen wrote:
> Alan Robertson wrote:
>
>> You should probably be setting each resource agent you fork as a
>> process group anyway, because then if you have to kill them, you will
>> kill ALL of its child processes too...
>
> Ok, I will change the process group of the RA.
>
>>
>> But, I still wonder what kind of testing you're doing such that you
>> shut down heartbeat while things are still being taken over. This is
>> not an ideal circumstance.
>>
>
> The situation is to shutdown the last node in the cluster.
> 1. heartbeat -k
> 2. heartbeat sends SIGTERM to CRM process group
> 3. CRM sends several stop operations to LRM to release all the resources
> it holding and exit immediatly.
OK. Then the CRM is broken. It should wait for all resources to be
released before it exits. This is because it MUST take some kind of action
if a resource fails to stop correctly. It is not enough to pray for
correct stop status ;-).
Simple software is good. This is one of those cases where it is too
simple. Unfortunately shutdown is one of the more difficult things about a
resource manager. It has been a continual problem in release 1, and not
just because the current architecture isn't very good.
> 4. Then heartbeat sends SIGTERM to LRM process group, but now the child
> process of LRM, stop operations (step 3) are on going.
Yes, I see that.
> Any suggestion? I will be on IRC.
First, I think the CRM MUST monitor "stop" status and wait until everything
is correctly stopped. This exiting immediately is a bug. In fact, in
addition to being incorrect, you've had to work twice now to work around
this bug in the CRM. This is an additional sign that something is wrong in
the CRM.
Secondly, there's no harm in making your processes process groups anyway.
You can't really kill them correctly otherwise. You only need to do this
in the case of a cancelled monitor operation or a timeout limit exceeded -
but you do need to do that in those cases - and you can't kill everything
cleanly without making each resource operation a process group.
But, the bug is primarily the CRM's, here.
--
Alan Robertson <alanr at unix.sh>
"Openness is the foundation and preservative of friendship... Let me claim
from you at all times your undisguised opinions." - William Wilberforce
More information about the Linux-HA-Dev
mailing list