[Linux-HA] Is this the right config?
kbyrd-linuxha at memcpy.com
Sun Sep 30 18:37:10 MDT 2007
I'm running 2.1.2 on two nodes. I want heartbeat to manage 22 VMware VMs
across the two nodes. In terms of heartbeat resources, each VM is:
- drbd ocf master-slave resource
- Filesystem ocf resource (XFS)
- VM ocf resource (my own ocf script)
I'm looking for advice on how to group these resources since the all
depend on each other. I'm testing with config very similar to the
DRBD/HowTov2 example at: http://www.linux-ha.org/DRBD/HowTov2.
I have a drbd master/slave resource (ms-drbd1), and then a group
(group_vm1). The group contains and filesystem resource (vm1-fs) and my VM
(vm1-vm) resource. I have an rsc_order contraint saying group_vm1 should
only run where ms-drbd1 has been promoted. I also have a rsc_colocation
constraint saying group_vm1 follows ms-drbd1. Finally I have a location
constaint saying ms-drbd1 prefers node1.
When testing this with two VMs (add ms-drbd2 and group_vm2, prefering
node2), things don't always work out as planned. Sometimes, with only one
node running, if I "/etc/init.d/heartbeat start" on node1, ms-drbd1 and
group_vm1 will try to migrate over to node1, fail then return back to
node2. It's not clear to me what's failing. I feel like sometimes I end up
in a state where the drbd resource starts, but the filesystem doesn't and
therefore the VM resource doesn't. Maybe I need a delay betweeb resource
starts? Should I be grouping these differently? I'm going to be creating
22 of these "group of three" resources, with three constraints for each.
Is there an easier set of XML to configure this? I want half to prefer one
node, and half to prefer the other. Finally, if both nodes are up and
group_vm1 failed to start on a node, will it retry later? Actually, that's
more important to me in the single node case as there is no other place
for the failed resource to live.
More information about the Linux-HA