[Linux-ha-dev] Re: Heartbeat - Dev: changeset 11628:e4a4c6fd5649

Alan Robertson alanr at unix.sh
Tue Dec 4 06:39:41 MST 2007


Andrew Beekhof wrote:
> this commit is wrong - only the children indicated in the process 
> definition are allowed to die
> please revert this change asap
> 
> http://hg.linux-ha.org/dev/rev/e4a4c6fd5649

Well... That's not what happens in reality, and as far as I can tell 
it's expected.

When one of your processes dies, it creates a cascading chain of other 
dying processes which are connected to it via IPC, which die when it 
dies.  As a result, when something important like the CIB dies, 
virtually any/every one of your processes can die as a result.  Which 
one(s) die before the node suicides depend on the timing.

The key causative factors of this are:
	Your processes don't suicide directly.
	It appears that file descriptor notification
		pretty often happens before death-of-child
		signals

So, a process (let's say the CIB) dies, and then one or more of
	its many local peers (CRM, pengine, attrd, tengine,
	etc.) discovers that it has disconnected.  It in turn
	dies, and depending on the relative timing of when
	the log message gets sent out or the suicide occurs,
	the log messages may be received by the remote logging
	daemon - or not.

What have I missed here?

-- 
     Alan Robertson <alanr at unix.sh>

"Openness is the foundation and preservative of friendship...  Let me 
claim from you at all times your undisguised opinions." - William 
Wilberforce


More information about the Linux-HA-Dev mailing list