[Linux-ha-dev] Re: Heartbeat - Dev: changeset 11628:e4a4c6fd5649
alanr at unix.sh
Tue Dec 4 06:39:41 MST 2007
Andrew Beekhof wrote:
> this commit is wrong - only the children indicated in the process
> definition are allowed to die
> please revert this change asap
Well... That's not what happens in reality, and as far as I can tell
When one of your processes dies, it creates a cascading chain of other
dying processes which are connected to it via IPC, which die when it
dies. As a result, when something important like the CIB dies,
virtually any/every one of your processes can die as a result. Which
one(s) die before the node suicides depend on the timing.
The key causative factors of this are:
Your processes don't suicide directly.
It appears that file descriptor notification
pretty often happens before death-of-child
So, a process (let's say the CIB) dies, and then one or more of
its many local peers (CRM, pengine, attrd, tengine,
etc.) discovers that it has disconnected. It in turn
dies, and depending on the relative timing of when
the log message gets sent out or the suicide occurs,
the log messages may be received by the remote logging
daemon - or not.
What have I missed here?
Alan Robertson <alanr at unix.sh>
"Openness is the foundation and preservative of friendship... Let me
claim from you at all times your undisguised opinions." - William
More information about the Linux-HA-Dev