[Linux-ha-dev] duplicate resource active in 2.1.4-RC
Lars Marowsky-Bree
lmb at suse.de
Wed Aug 13 06:03:35 MDT 2008
On 2008-08-13T17:11:54, Keisuke MORI <kskmori at intellilink.co.jp> wrote:
> Hi,
>
> I've got an unexpected behavior during our regression test
> for the 2.1.4 release.
>
> When the stop of a resource with on_fail=block failed, it looks
> like the resource is running on the both nodes according to the
> log and crm_mon.
>
> In 2.1.3 it didn't happen and had been working fine as expected,
> and the problem occurs in the current lha-2.1 (0d61ad37ee9a)
Yeah, this looks like a real bug.
Aug 13 15:59:15 sunnyvale pengine: [18478]: notice: unpack_config: On loss of CCM Quorum: Ignore
Aug 13 15:59:15 sunnyvale pengine: [18478]: info: unpack_nodes: Node cupertino is in standby-mode
Aug 13 15:59:15 sunnyvale pengine: [18478]: info: determine_online_status: Node sunnyvale is online
Aug 13 15:59:15 sunnyvale pengine: [18478]: info: determine_online_status: Node cupertino is standby
Aug 13 15:59:15 sunnyvale pengine: [18478]: info: common_apply_stickiness: Setting failure stickiness for group1-dummy1 on cupertino: -1000000
Aug 13 15:59:15 sunnyvale pengine: [18478]: info: unpack_rsc_op: Remapping group1-dummy1_stop_0 (rc=1) on cupertino to an ERROR (expected 0)
Aug 13 15:59:15 sunnyvale pengine: [18478]: WARN: unpack_rsc_op: Processing failed op group1-dummy1_stop_0 on cupertino: Error
Aug 13 15:59:15 sunnyvale pengine: [18478]: notice: group_print: Resource Group: non_clone_group1
Aug 13 15:59:15 sunnyvale pengine: [18478]: notice: native_print: group1-dummy1 (ocf::heartbeat:Dummy1): Started cupertino (unmanaged) FAILED
Aug 13 15:59:15 sunnyvale pengine: [18478]: notice: native_print: group1-dummy2 (ocf::heartbeat:Dummy2): Stopped
Aug 13 15:59:15 sunnyvale pengine: [18478]: WARN: custom_action: Action group1-dummy1_start_0 (unmanaged)
Aug 13 15:59:15 sunnyvale pengine: [18478]: notice: NoRoleChange: Leave resource group1-dummy1 (Started sunnyvale)
Aug 13 15:59:15 sunnyvale pengine: [18478]: WARN: custom_action: Action group1-dummy1_stop_0 (unmanaged)
Aug 13 15:59:15 sunnyvale pengine: [18478]: WARN: custom_action: Action group1-dummy1_start_0 (unmanaged)
Aug 13 15:59:15 sunnyvale pengine: [18478]: notice: StartRsc: sunnyvale Start group1-dummy1
Aug 13 15:59:15 sunnyvale pengine: [18478]: WARN: custom_action: Action group1-dummy1_start_0 (unmanaged)
Aug 13 15:59:15 sunnyvale pengine: [18478]: notice: RecurringOp: Start recurring monitor (5s) for group1-dummy1 on sunnyvale
Aug 13 15:59:15 sunnyvale pengine: [18478]: notice: StartRsc: sunnyvale Start group1-dummy2
Aug 13 15:59:15 sunnyvale pengine: [18478]: notice: RecurringOp: Start recurring monitor (5s) for group1-dummy2 on sunnyvale
Following the failure to stop, the resource is considered unmanaged +
failed (which is correct).
Aug 13 15:59:15 sunnyvale pengine: [18478]: notice: NoRoleChange: Leave resource group1-dummy1 (Started sunnyvale)
Is the crucial line; it's never been started there, this is where the
bug begins.
In response to it being started there, it then starts spawning monitors
etc, which is of course incorrect.
Regards,
Lars
--
Teamlead Kernel, SuSE Labs, Research and Development
SUSE LINUX Products GmbH, GF: Markus Rex, HRB 16746 (AG Nürnberg)
"Experience is the name everyone gives to their mistakes." -- Oscar Wilde
More information about the Linux-HA-Dev
mailing list