[Linux-ha-dev] Lrmd creeping processor utilization

Dejan Muhamedagic dejanmm at fastmail.fm
Thu Oct 11 15:45:07 MDT 2007


Hi,

On Thu, Oct 11, 2007 at 08:44:19PM +0100, Simon Talbot wrote:
> All,
> 
> I am currently doing some work with 2.1.2 in a fairly simple four node
> configuration, basically running two different ldirectord configurations
> on two nodes and the third and fourth acting as spares which takes over
> should any other node fail. The fourth node, whilst present in all the
> configurations, has never actually been seen by the cluster (ie, no UUID
> for it exists etc.)
> 
> The cluster work fine, fails over perfectly and generally behaves, but
> slowly the lrmd process increases in processor utilization over the
> period of about 24 hours it will rise from virtually 0 to around 15%,
> after a few more days it will be up at the 90-99% mark. If a restart is
> issued to lrmd it goes back to 0 and starts climbing again.

This has been fixed here:

http://hg.linux-ha.org/dev/rev/d7e41b482c62

There's also some discussion about it here:

http://old.linux-foundation.org/developer_bugzilla/show_bug.cgi?id=1697

> Having scoured the archives, I see some references to high CPU
> utilization being caused by nodes that have disappeared and split brains
> (thought to be caused by rexmit) but I can't see how this would affect
> my set-up as the fourth node in my configuration has never been seen.
> 
> When in the high utilization state, heartbeat also does not stop cleanly
> and a kill is sometimes necessary to fully shut it down.

There has also been some work done in this area.

> Additionally, many of the following messages start getting generated:
> 
> Oct 11 15:30:23 th-lb2 lrmd: [27507]: WARN: G_SIG_dispatch: Dispatch
> function for SIGCHLD took too long to execute: 60 ms (> 30 ms) (GSource:
> 0x805ec10)

This is more of an annoyance rather than a problem. It has
also been fixed:

http://hg.linux-ha.org/dev/rev/f5a50dd5a966

See:

http://old.linux-foundation.org/developer_bugzilla/show_bug.cgi?id=1715

> When running normally, the system is not under high load, and I would
> expect normal processor utilization of the system to be at about 1 or 2%
> 
> I have just started some further tests with the 4th node eliminated from
> the configuration to eliminate that from the process, but thought I
> would trop the list a note to see if this rang any bells for anyone. If
> no one has seen this behaviour, then I would also be appreciative of any
> pointers to good places in the code to start attacking with a GDB.
> 
> I have kept this email short and not attached configs, logs etc., but
> they are available if anyone is interested.
> 
> Simon

Thanks,

Dejan

> Simon Talbot MEng, ACGI 
> (Chief Engineer) 
> Net Solutions Europe Limited
> Tel: 020 3161 6001
> Fax: 020 3161 6011
> 
> The information contained in this e-mail and any attachments are private
> 
> and confidential and may be legally privileged. 
> 
> It is intended for the named addressee(s) only. If you are not the 
> intended recipient(s), you must not read, copy or use the information 
> contained in any way. If you receive this email or any attachments in 
> error, please notify us immediately by e-mail and destroy any copy you 
> have of it. 
> 
> We accept no responsibility for any loss or damages whatsoever arising 
> in any way from receipt or use of this e-mail or any attachments. This 
> e-mail is not intended to create legally binding commitments on our 
> behalf, nor do its comments reflect our corporate views or policies. 
> _______________________________________________________
> Linux-HA-Dev: Linux-HA-Dev at lists.linux-ha.org
> http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
> Home Page: http://linux-ha.org/


More information about the Linux-HA-Dev mailing list