[Linux-ha-dev] Lrmd creeping processor utilization
Dejan Muhamedagic
dejanmm at fastmail.fm
Thu Oct 11 15:45:07 MDT 2007
Hi,
On Thu, Oct 11, 2007 at 08:44:19PM +0100, Simon Talbot wrote:
> All,
>
> I am currently doing some work with 2.1.2 in a fairly simple four node
> configuration, basically running two different ldirectord configurations
> on two nodes and the third and fourth acting as spares which takes over
> should any other node fail. The fourth node, whilst present in all the
> configurations, has never actually been seen by the cluster (ie, no UUID
> for it exists etc.)
>
> The cluster work fine, fails over perfectly and generally behaves, but
> slowly the lrmd process increases in processor utilization over the
> period of about 24 hours it will rise from virtually 0 to around 15%,
> after a few more days it will be up at the 90-99% mark. If a restart is
> issued to lrmd it goes back to 0 and starts climbing again.
This has been fixed here:
http://hg.linux-ha.org/dev/rev/d7e41b482c62
There's also some discussion about it here:
http://old.linux-foundation.org/developer_bugzilla/show_bug.cgi?id=1697
> Having scoured the archives, I see some references to high CPU
> utilization being caused by nodes that have disappeared and split brains
> (thought to be caused by rexmit) but I can't see how this would affect
> my set-up as the fourth node in my configuration has never been seen.
>
> When in the high utilization state, heartbeat also does not stop cleanly
> and a kill is sometimes necessary to fully shut it down.
There has also been some work done in this area.
> Additionally, many of the following messages start getting generated:
>
> Oct 11 15:30:23 th-lb2 lrmd: [27507]: WARN: G_SIG_dispatch: Dispatch
> function for SIGCHLD took too long to execute: 60 ms (> 30 ms) (GSource:
> 0x805ec10)
This is more of an annoyance rather than a problem. It has
also been fixed:
http://hg.linux-ha.org/dev/rev/f5a50dd5a966
See:
http://old.linux-foundation.org/developer_bugzilla/show_bug.cgi?id=1715
> When running normally, the system is not under high load, and I would
> expect normal processor utilization of the system to be at about 1 or 2%
>
> I have just started some further tests with the 4th node eliminated from
> the configuration to eliminate that from the process, but thought I
> would trop the list a note to see if this rang any bells for anyone. If
> no one has seen this behaviour, then I would also be appreciative of any
> pointers to good places in the code to start attacking with a GDB.
>
> I have kept this email short and not attached configs, logs etc., but
> they are available if anyone is interested.
>
> Simon
Thanks,
Dejan
> Simon Talbot MEng, ACGI
> (Chief Engineer)
> Net Solutions Europe Limited
> Tel: 020 3161 6001
> Fax: 020 3161 6011
>
> The information contained in this e-mail and any attachments are private
>
> and confidential and may be legally privileged.
>
> It is intended for the named addressee(s) only. If you are not the
> intended recipient(s), you must not read, copy or use the information
> contained in any way. If you receive this email or any attachments in
> error, please notify us immediately by e-mail and destroy any copy you
> have of it.
>
> We accept no responsibility for any loss or damages whatsoever arising
> in any way from receipt or use of this e-mail or any attachments. This
> e-mail is not intended to create legally binding commitments on our
> behalf, nor do its comments reflect our corporate views or policies.
> _______________________________________________________
> Linux-HA-Dev: Linux-HA-Dev at lists.linux-ha.org
> http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
> Home Page: http://linux-ha.org/
More information about the Linux-HA-Dev
mailing list