[Linux-ha-dev] [Patch] The patch which revises memory leak.
Alan Robertson
alanr at unix.sh
Tue May 8 08:06:55 MDT 2012
FYI: there is code in the heartbeat communication layer which is quite
happy to simulate lost packets.
I made it difficult to turn on accidentally. Read the code for details
if you're interested.
On 04/30/2012 10:21 PM, renayama19661014 at ybb.ne.jp wrote:
> Hi Lars,
>
> We confirmed that this problem occurred with v1 mode of Heartbeat.
> * The problem happens with the v2 mode in the same way.
>
> We confirmed a problem in the next procedure.
>
> Step 1) Put a special device extinguishing a communication packet of Heartbeat in the network.
>
> Step 2) Between nodes, the retransmission of the message is carried out repeatedly.
>
> Step 3) Then the memory of the master process increases little by little.
>
>
> -------- As a result of the ps command of the master process ----------
> * node1
> (start)
> 32126 ? SLs 0:00 0 182 53989 7128 0.0 heartbeat: master control process
> (One hour later)
> 32126 ? SLs 0:03 0 182 54729 7868 0.0 heartbeat: master control process
> (Two hour later)
> 32126 ? SLs 0:08 0 182 55317 8456 0.0 heartbeat: master control process
> (Four hours later)
> 32126 ? SLs 0:24 0 182 56673 9812 0.0 heartbeat: master control process
>
> * node2
> (start)
> 31928 ? SLs 0:00 0 182 53989 7128 0.0 heartbeat: master control process
> (One hour later)
> 31928 ? SLs 0:02 0 182 54481 7620 0.0 heartbeat: master control process
> (Two hour later)
> 31928 ? SLs 0:08 0 182 55353 8492 0.0 heartbeat: master control process
> (Four hours later)
> 31928 ? SLs 0:23 0 182 56689 9828 0.0 heartbeat: master control process
>
>
> The state of the memory leak seems to vary according to a node with the quantity of the retransmission.
>
> The increase of this memory disappears by applying my patch.
>
> And the similar correspondence seems to be necessary in send_reqnodes_msg(), but this is like little leak.
>
> Best Regards,
> Hideo Yamauchi.
>
>
> --- On Sat, 2012/4/28, renayama19661014 at ybb.ne.jp<renayama19661014 at ybb.ne.jp> wrote:
>
>> Hi Lars,
>>
>> Thank you for comments.
>>
>>> Have you actually been able to measure that memory leak you observed,
>>> and you can confirm this patch will fix it?
>>>
>>> Because I don't think this patch has any effect.
>> Yes.
>> I really measured leak.
>> I can show a result next week.
>> #Japan is a holiday until Tuesday.
>>
>>> send_rexmit_request() is only used as paramter to
>>> Gmain_timeout_add_full, and it returns FALSE always,
>>> which should cause the respective sourceid to be auto-removed.
>> It seems to be necessary to release gsource somehow or other.
>> The similar liberation seems to be carried out in lrmd.
>>
>> Best Regards,
>> Hideo Yamauchi.
>>
>>
>> --- On Fri, 2012/4/27, Lars Ellenberg<lars.ellenberg at linbit.com> wrote:
>>
>>> On Thu, Apr 26, 2012 at 10:56:30AM +0900, renayama19661014 at ybb.ne.jp wrote:
>>>> Hi All,
>>>>
>>>> We gave test that assumed remote cluster environment.
>>>> And we tested packet lost.
>>>>
>>>> The retransmission timer of Heartbeat causes memory leak.
>>>>
>>>> I donate a patch.
>>>> Please confirm the contents of the patch.
>>>> And please reflect a patch in a repository of Heartbeat.
>>> Have you actually been able to measure that memory leak you observed,
>>> and you can confirm this patch will fix it?
>>>
>>> Because I don't think this patch has any effect.
>>>
>>> send_rexmit_request() is only used as paramter to
>>> Gmain_timeout_add_full, and it returns FALSE always,
>>> which should cause the respective sourceid to be auto-removed.
>>>
>>>
>>>> diff -r 106ca984041b heartbeat/hb_rexmit.c
>>>> --- a/heartbeat/hb_rexmit.c Thu Apr 26 19:28:26 2012 +0900
>>>> +++ b/heartbeat/hb_rexmit.c Thu Apr 26 19:31:44 2012 +0900
>>>> @@ -164,6 +164,8 @@
>>>> seqno_t seq = (seqno_t) ri->seq;
>>>> struct node_info* node = ri->node;
>>>> struct ha_msg* hmsg;
>>>> + unsigned long sourceid;
>>>> + gpointer value;
>>>>
>>>> if (STRNCMP_CONST(node->status, UPSTATUS) != 0&&
>>>> STRNCMP_CONST(node->status, ACTIVESTATUS) !=0) {
>>>> @@ -196,11 +198,17 @@
>>>>
>>>> node->track.last_rexmit_req = time_longclock();
>>>>
>>>> - if (!g_hash_table_remove(rexmit_hash_table, ri)){
>>>> - cl_log(LOG_ERR, "%s: entry not found in rexmit_hash_table"
>>>> - "for seq/node(%ld %s)",
>>>> - __FUNCTION__, ri->seq, ri->node->nodename);
>>>> - return FALSE;
>>>> + value = g_hash_table_lookup(rexmit_hash_table, ri);
>>>> + if ( value != NULL) {
>>>> + sourceid = (unsigned long) value;
>>>> + Gmain_timeout_remove(sourceid);
>>>> +
>>>> + if (!g_hash_table_remove(rexmit_hash_table, ri)){
>>>> + cl_log(LOG_ERR, "%s: entry not found in rexmit_hash_table"
>>>> + "for seq/node(%ld %s)",
>>>> + __FUNCTION__, ri->seq, ri->node->nodename);
>>>> + return FALSE;
>>>> + }
>>>> }
>>>>
>>>> schedule_rexmit_request(node, seq, max_rexmit_delay);
>>>
>>> --
>>> : Lars Ellenberg
>>> : LINBIT | Your Way to High Availability
>>> : DRBD/HA support and consulting http://www.linbit.com
>>>
>>> DRBD® and LINBIT® are registered trademarks of LINBIT, Austria.
>>> _______________________________________________________
>>> Linux-HA-Dev: Linux-HA-Dev at lists.linux-ha.org
>>> http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
>>> Home Page: http://linux-ha.org/
>>>
>> _______________________________________________________
>> Linux-HA-Dev: Linux-HA-Dev at lists.linux-ha.org
>> http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
>> Home Page: http://linux-ha.org/
>>
> _______________________________________________________
> Linux-HA-Dev: Linux-HA-Dev at lists.linux-ha.org
> http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
> Home Page: http://linux-ha.org/
--
Alan Robertson<alanr at unix.sh> - @OSSAlanR
"Openness is the foundation and preservative of friendship... Let me claim from you at all times your undisguised opinions." - William Wilberforce
More information about the Linux-HA-Dev
mailing list