[Linux-HA] Some issues with heartbeat
dejanmm at fastmail.fm
Fri Dec 14 06:27:48 MST 2007
On Fri, Dec 14, 2007 at 01:29:51PM +0100, Miguel Araujo wrote:
> Hello HA list!
> I have been 2 or 3 days at the IRC channel posing some questions and
> probably abusing your patience a little bit ;) which wasn't my intention
> at all. I got to the HA world about 6 days ago and I have been gobbling
> up documentation. First I would like to say that I find the
> documentation at the HA site dispersed and hard to follow. However the
> wiki web is easier to read and find what you are looking for.
> Once said this, I would like to be able to understand heartbeat in a
> deep way and in future if possible collaborate with the documentation
> part or patching the software. I came across with HA because I'm working
> on a virtualization project. Basically what I have to do is set up 4
> machines with XEN3.1 using exported block devices from a SAN using
There's a brand new iSCSI ocf RA.
> That's already done, but now I was asked to make the service High
> available. This is how I found heartbeat and started reading about this
> tested software with you many years of experience in the field and that
> so many people rely on.
> What I want to do is to monitor domUs so that if they fail I can move
> them to other nodes of my 4 node cluster. In the IRC they have already
> explained me some of the issues I didn't understand before, like you can
> not monitor the whole dom0, you monitor resources (domUs). After doing
> that I would like to do a stacked cluster monitoring services that the
> domUs run, but this may take ages.
Recently the Xen ocf RA saw some improvements. In particular,
there's a way to hook scripts to monitor resources within the
DomU which may allow you to keep the heartbeat running in the
Dom0. See recent discussion on Xen and the bugzilla entry.
Perhaps you could aid in testing it.
> I have a 2 node cluster with a SAN for testing and having fun, so don't
> worry about warranties. Dominik Klein gently passed me his cib file for
> XEN, I understand almost the whole of it. The problem now is that I
> would like to add fencing to the cluster. Here comes the questions:
> 1.- Are fencing and STONITH different technologies that let you avoid
> brain-split? or they are just two different concepts achieved by the
> STONITH way?
Fencing is a term, STONITH a technology. STONITH makes fencing
> 2.- How does STONITH know how to differentiate between a communication
> network failure and a crash on the node?
It can't. BTW, STONITH is just a way to reset the node. Other
components (pengine in particular) decide when to reset the node.
> I mean if the node's network
> fails, how can the STONITH device kill it?
There are STONITH devices (mainly UPS) which are controlled over
serial. Another class is the lights off style devices, such as
ilo (HP) or rsa (IBM).
> 3.- As my SAN is not like a ServeRAID (it's not resource self fencing to
> say it somehow) I would like to run a different fencing script for every
> node of my cluster as every node have in a first time different block
> devices mounted (one for every virtual machine). I know how to block
> nodes accessing the SAN using its CLI, can it be done? could you pass me
> some example cib files?
CIB files won't help here. Managing access is not so easy to
implement. There's some code around for that, I believe. Don't
know about its state though. Look for "shared disk access" or
similar in the dev list archives.
> 4.- Would you mind listing some STONITH devices available in the market?
Take a look at the output of stonith -L. Those are supported.
> Finally I want to thank you all your time and effort. It's very likely
> you will receive more reply mails asking more of these questions, thanks
> in advanced.
> Miguel Araujo
> Linux-HA mailing list
> Linux-HA at lists.linux-ha.org
> See also: http://linux-ha.org/ReportingProblems
More information about the Linux-HA