[Linux-ha-dev] Re: [RFC] heartbeat-2.1.4
Keisuke MORI
kskmori at intellilink.co.jp
Tue Apr 22 05:17:29 MDT 2008
Hi,
"Andrew Beekhof" <beekhof at gmail.com> writes:
> On Wed, Apr 16, 2008 at 1:31 PM, HIDEO YAMAUCHI
> <renayama19661014 at ybb.ne.jp> wrote:
>> Hi Andrew,
>>
>>
>> > I asked for the right function but the wrong frame number - I should
>> > have asked for frame 2. Sorry :(
>>
>> (gdb) frame 2
>> #2 0x0000000000416c74 in stop_recurring_action_by_rsc (key=0x755f60, value=0x755f40,
>> user_data=0x545a10) at lrm.c:1442
>> 1442 if(op->interval != 0 && safe_str_eq(op->rsc_id, rsc->id)) {
>> (gdb) print *rsc
>> Variable "rsc" is not available.
>> (gdb) print *op
>> No symbol "op" in current context.
>>
>> Is what or my operation a mistake?
>
> Looks like gcc is being too clever for it's own good (by optimizing
> away some of the variables) :-(
>
> Can you try the following patch please?
>
> diff -r be12cb83cd2d crmd/lrm.c
> --- a/crmd/lrm.c Wed Apr 16 10:46:59 2008 +0200
> +++ b/crmd/lrm.c Wed Apr 16 15:02:16 2008 +0200
> @@ -1451,7 +1451,7 @@ stop_recurring_action_by_rsc(gpointer ke
> {
> lrm_rsc_t *rsc = user_data;
> struct recurring_op_s *op = (struct recurring_op_s*)value;
> -
> + crm_info("op->rsc=%s (%p), rsc=%s (%p)", crm_str(op->rsc_id),
> op->rsc_id, crm_str(rsc->id), rsc->id);
> if(op->interval != 0 && safe_str_eq(op->rsc_id, rsc->id)) {
> cancel_op(rsc, key, op->call_id, FALSE);
> }
I think I found the cause of this issue.
I attached the additional log with your patch (a bit different though)
and the stacktrace.
Here's my observation:
- An element of pending_ops is removed at lrm.c:L497
- It is called inside from g_has_table_foreach() at L1475
- This is violating the usage of g_has_table_foreach() according
to the glib manual.
- Therefore the iteration can not proceed correctly and would
try to refer to a removed element.
http://hg.linux-ha.org/lha-2.1/annotate/333aef5bd4ed/crm/crmd/lrm.c
(...)
946 /* not doing this will block the node from shutting down */
947 g_hash_table_remove(pending_ops, key);
(...)
1475 g_hash_table_foreach(pending_ops, stop_recurring_action_by_rsc, rsc);
http://library.gnome.org/devel/glib/stable/glib-Hash-Tables.html#g-hash-table-foreach
(...)
The hash table may not be modified while iterating over it (you can't add/remove items).
I also attached my suggested patch, although I can not guarantee
the correctness but just to show you the idea.
Thanks,
--
Keisuke MORI
NTT DATA Intellilink Corporation
-------------- next part --------------
A non-text attachment was scrubbed...
Name: ms-additional-log-20080422.tar.gz
Type: application/octet-stream
Size: 14306 bytes
Desc: ms-additional-log-20080422.tar.gz
Url : http://lists.community.tummy.com/pipermail/linux-ha-dev/attachments/20080422/b1956f25/ms-additional-log-20080422.tar-0001.obj
-------------- next part --------------
A non-text attachment was scrubbed...
Name: ms-coredump.patch
Type: text/x-patch
Size: 2465 bytes
Desc: ms-coredump.patch
Url : http://lists.community.tummy.com/pipermail/linux-ha-dev/attachments/20080422/b1956f25/ms-coredump-0001.bin
More information about the Linux-HA-Dev
mailing list