[Linux-HA] HA2 OCF CRM: Manage multiple DRBD Resources
lmb at suse.de
Wed Jul 4 11:08:47 MDT 2007
On 2007-07-04T15:04:36, Dominik Klein <dk at in-telegence.net> wrote:
> 1: crm_resource -r ms-r0 -v 'started' -p target_role
> 1: crm_resource -r fs0 -v 'started' -p target_role
Sure you didn't forget a --meta here?
> <crm_mon shows r0 "started" for both nodes -> not good>
> 1: drbdadm state r0
> <OCF script needs to be changed to recognize this (maybe new drbd8)
> state after just the module being loaded>
Probably this is screwing up the initial start up probe we do. It
appears drbd8 doesn't quite work, which doesn't come as a surprise. You
will need to make a few more changes.
> So except for changing and copying the script, I started over from
> reboot up to target_role=started for fs0
> <now crm_mon show r0:0 on acd-xen03 is master>
> <fs0 is mounted on acd-xen03>
> <2 online nodes, *4* resources>
4 resources? Weird. It might be a bit late in the game to ask this, but
which heartbeat version, exactly, are you running?
You're drbd RA seems to still put the master_slave preference into the
configuration section instead of a transient node attribute, which
indicates you're not running our latest code?
> debug: unpack_rsc_order: r0_before_fs0: ms-r0.promote after fs0.start
> debug: unpack_rsc_order: r1_before_fs1: ms-r1.promote after fs1.start
> debug: cib_native_signoff: Signing out of the CIB Service
> <r1:2 looks suspicious - no idea where this comes from>
That, too, reminds me of a bug which has been fixed in the past ...
> The notify action times out (20s).
20s is low, you should increase it.
> Jul 4 14:45:59 ACD-xen03 drbd_master_slave: : DEBUG: DK
> before crm_master -v 75
> Jul 4 14:45:59 ACD-xen03 drbd_master_slave: : DEBUG: r1:
> Calling /usr/sbin/crm_master -v 75
> ########### notice: +20s
But true, this is weird, it should take so long.
> Please note that this behaviour is not dependant on my r0 or r1
> resource. If I start out with r0, r0 works and r1 faults. If I start the
> other way around with r1, then r0 will fault.
> Maybe you can still help me figure this out.
Hm, I don't have a good idea of the top of my head. I'd need to try and
reproduce on my own cluster.
Teamlead Kernel, SuSE Labs, Research and Development
SUSE LINUX Products GmbH, GF: Markus Rex, HRB 16746 (AG Nürnberg)
"Experience is the name everyone gives to their mistakes." -- Oscar Wilde
More information about the Linux-HA