[Linux-HA] Some issues with heartbeat
Miguel Araujo
maraujo at Nosys.net
Fri Dec 14 05:54:44 MST 2007
Hello HA list!
I have been 2 or 3 days at the IRC channel posing some questions and
probably abusing your patience a little bit ;) which wasn't my intention
at all. I got to the HA world about 6 days ago and I have been gobbling
up documentation. First I would like to say that I find the
documentation at the HA site dispersed and hard to follow. However the
wiki web is easier to read and find what you are looking for.
Once said this, I would like to be able to understand heartbeat in a
deep way and in future if possible collaborate with the documentation
part or patching the software. I came across with HA because I'm working
on a virtualization project. Basically what I have to do is set up 4
machines with XEN3.1 using exported block devices from a SAN using
iSCSI. That's already done, but now I was asked to make the service High
available. This is how I found heartbeat and started reading about this
tested software with you many years of experience in the field and that
so many people rely on.
What I want to do is to monitor domUs so that if they fail I can move
them to other nodes of my 4 node cluster. In the IRC they have already
explained me some of the issues I didn't understand before, like you can
not monitor the whole dom0, you monitor resources (domUs). After doing
that I would like to do a stacked cluster monitoring services that the
domUs run, but this may take ages.
I have a 2 node cluster with a SAN for testing and having fun, so don't
worry about warranties. Dominik Klein gently passed me his cib file for
XEN, I understand almost the whole of it. The problem now is that I
would like to add fencing to the cluster. Here comes the questions:
1.- Are fencing and STONITH different technologies that let you avoid
brain-split? or they are just two different concepts achieved by the
STONITH way?
2.- How does STONITH know how to differentiate between a communication
network failure and a crash on the node? I mean if the node's network
fails, how can the STONITH device kill it?
3.- As my SAN is not like a ServeRAID (it's not resource self fencing to
say it somehow) I would like to run a different fencing script for every
node of my cluster as every node have in a first time different block
devices mounted (one for every virtual machine). I know how to block
nodes accessing the SAN using its CLI, can it be done? could you pass me
some example cib files?
4.- Would you mind listing some STONITH devices available in the market?
Finally I want to thank you all your time and effort. It's very likely
you will receive more reply mails asking more of these questions, thanks
in advanced.
Regards,
Miguel Araujo
More information about the Linux-HA
mailing list