[Linux-HA] MMM conflict with Pacemaker
marcus at synchromedia.co.uk
Thu Feb 16 11:30:20 MST 2012
On 16 Feb 2012, at 18:00, Mark Grennan wrote:
> Yes HA systems are very confusing.
It's not so much that - it's more that heartbeat/crm/pacemaker/corosync is confusing, not least because it keeps changing its name. Constant changing of names, nomenclature and config settings guarantees that any articles written about it won't work for long.
> Pacemaker is the name of an older application. Corasync is it's new name but some of the files still maintain the old name.
Huh? So why does corosync need setting up to work with pacemaker if it is now pacemaker? Even your doc installs them (and heartbeat) from separate packages!
> One Issue I can think of is, Pacemaker wants to bind the floating IP as eth#:#, while MMM wants to use a different method that can only be seen with the IP command. I think they are fighting over who owns the floating IP.
But pacemaker isn't even running on the machines the mmm float is on! It's somehow interfering with the monitoring node, not the float that it's managing. I don't have a problem with using the ip command - I was under the impression it's how things are supposed to be done now? I've seen mixtures of ifconfig-style network config coexisting quite happily with ip-style ones before.
My original config:
server3: mmm monitor
server4: mmm agent
server5: mmm agent
There is a floating IP on servers 1 and 2, and another one on servers 4 and 5.
What I want to change to:
server3: pacemaker + mmm monitor
server4: mmm agent
server5: mmm agent
Here there is a floating IP on 2 and 3, and another on 4 and 5. I don't see any reason they should conflict since there is no overlap of machines that floats are on. What seems to happen is that as soon as corosync is started, the mmm monitor can no longer see the network at all. I suspect this could be something to do with the suggested setting of using the network address for bindnetaddr in corosync.
I'm still mystified by whether I should use ucast, mcast or bcast - previous setups I've done with crm have used ucast. I see in your example you're binding to a private IP for corosync, but I can't understand why you're using a public IP for mcast, or why it's even there at all.
Your guide wasn't one of the ones I'd found, so thanks for the pointer. The most interesting one for me was this one, since it is closest to my own config and seems quite recent (i.e. it even mentions corosync): https://wiki.ubuntu.com/ClusterStack/LucidTesting
The official 'cluster from scratch' PDF skips over quite a few bits of vital info, so I found I couldn't really use it.
My mmm config was originally installed by Percona, and I've done several others since. mmm has always worked beautifully for me (even through multiple hardware and network failures), and the main complaint I've seen about it (1062 errors) is nothing to do with mmm. I fully understand that it has problems, however it has the advantage of being very stable and trivially easy to understand and configure. While I keep reading good things about pacemaker, the practical aspects of getting it to work have always turned into a yak-shaving festival, so I've always been put off pursuing it for anything beyond management of a single IP. One critical aspect of an HA system is that it should be really easy to deal with when things go wrong; I'd put xtrabackup in this category - it's great (though I hope you have automated tests for your restores as it went through a patch late last year when they were broken!).
Synchromedia Limited: Creators of http://www.smartmessages.net/
UK info at hand CRM solutions
marcus at synchromedia.co.uk | http://www.synchromedia.co.uk/
More information about the Linux-HA