[Linux-HA] Some issues with heartbeat
maraujo at Nosys.net
Fri Dec 14 05:54:44 MST 2007
Hello HA list!
I have been 2 or 3 days at the IRC channel posing some questions and
probably abusing your patience a little bit ;) which wasn't my intention
at all. I got to the HA world about 6 days ago and I have been gobbling
up documentation. First I would like to say that I find the
documentation at the HA site dispersed and hard to follow. However the
wiki web is easier to read and find what you are looking for.
Once said this, I would like to be able to understand heartbeat in a
deep way and in future if possible collaborate with the documentation
part or patching the software. I came across with HA because I'm working
on a virtualization project. Basically what I have to do is set up 4
machines with XEN3.1 using exported block devices from a SAN using
iSCSI. That's already done, but now I was asked to make the service High
available. This is how I found heartbeat and started reading about this
tested software with you many years of experience in the field and that
so many people rely on.
What I want to do is to monitor domUs so that if they fail I can move
them to other nodes of my 4 node cluster. In the IRC they have already
explained me some of the issues I didn't understand before, like you can
not monitor the whole dom0, you monitor resources (domUs). After doing
that I would like to do a stacked cluster monitoring services that the
domUs run, but this may take ages.
I have a 2 node cluster with a SAN for testing and having fun, so don't
worry about warranties. Dominik Klein gently passed me his cib file for
XEN, I understand almost the whole of it. The problem now is that I
would like to add fencing to the cluster. Here comes the questions:
1.- Are fencing and STONITH different technologies that let you avoid
brain-split? or they are just two different concepts achieved by the
2.- How does STONITH know how to differentiate between a communication
network failure and a crash on the node? I mean if the node's network
fails, how can the STONITH device kill it?
3.- As my SAN is not like a ServeRAID (it's not resource self fencing to
say it somehow) I would like to run a different fencing script for every
node of my cluster as every node have in a first time different block
devices mounted (one for every virtual machine). I know how to block
nodes accessing the SAN using its CLI, can it be done? could you pass me
some example cib files?
4.- Would you mind listing some STONITH devices available in the market?
Finally I want to thank you all your time and effort. It's very likely
you will receive more reply mails asking more of these questions, thanks
More information about the Linux-HA