[Linux-ha-dev] Info: CRM Memory Leaks
Andrew Beekhof
abeekhof at suse.de
Fri Feb 9 08:43:50 MST 2007
As many of you may have already read, there are/were a number of
significant memory leaks in the CRM.
First of all, don't panic.
The bad news is that they've been there for some time, however it
also means that if you've not noticed any ill-effects so far, then
you're unlikely to do so in the future.
The size of the leaks are proportional to:
* cluster size
* number of resources
* cluster activity (ie. resources restarting, or nodes joining/
leaving the cluster)
If there is no cluster activity, then no further memory will be leaked.
== WORST CASE SCENARIO ==
The worst case scenario is that the crmd process uses up all the
available memory and is shot.
In such cases, the master heartbeat process will simply respawn us
and we'll re-join the cluster.
If you're _really_ unlucky, heartbeat will be a fraction too slow
doing this and (if STONITH is enabled), the node may be fenced (to
ensure the node is dead before starting it's resources).
== WHAT NOW ==
Thanks to Valgrind, I've fixed all but the smallest leaks - so the
next version will be as leak free as I can make it (excepting library
functions that I can't do anything about).
For those that want/need the leaks fixed NOW, I've attached a patch
against 2.0.8
== NEVER AGAIN ==
The embarrassment of such a leak being present has prompted me to
make running the CRM under Valgrind exceedingly easy. Simply give
the --enable-valgrind option to configure and all 5 CRM processes
will complain bitterly if they're leaking memory (with a stack trace
of who allocated it!). Just remember to start a Valgrind lister on
localhost:1234.
When running tools like Valgrind, please remember that there are
results that I cannot do anything about. Eg:
* library functions that leak every time they're called
* library functions that create "global" data and offer no way to
clean it up^
For this reason, I have created a Valgrind suppression file which I
can make available if people are interested.
^ Technically this isn't a leak, since its a fixed size regardless of
the number of times a function is called, but it does make spotting
real leaks harder.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: crm-leaks.patch
Type: application/octet-stream
Size: 16011 bytes
Desc: not available
Url : http://lists.community.tummy.com/pipermail/linux-ha-dev/attachments/20070209/d24e4d8a/crm-leaks-0001.obj
More information about the Linux-HA-Dev
mailing list