aon.912411198 at aon.at
Sun Feb 11 13:16:02 MST 2001
On Sun, Feb 11, 2001 at 10:22:36PM +0530, Rajkumar S wrote:
> On Sun, 11 Feb 2001, Juri Haberland wrote:
> > we fail over the failed group but keep the other group.
> > This is something that heartbeat possibly don't have the infrastructure
> > for yet
> Yes, So ultimatly we should have a set of resources for the cluster, with
> all the node members having same idea of what is available and what is
well not all members have to know the state of all resources cause if
your cluster is big enough (>10 nodes) then most of the network bandwith
will be lost for monitoring.
For further reading about this I would suggest the papers of Stephen
Tweedy which can be optained at the linux ha website.
> This information can be send over the heartbeat so that the transmission
> is reliable.
yes and no. I feel a bit ambigious about this. The problem is that there
is the need of transmitting the information that something has failed.
But the receiver is the so called cluster manager which can decide where
the resource will go. One thing that can be done locally before contacting
that cluster manager is to try to restart it (but this is some detail that
is configuration dependent).
The cluster manager is still the only component that can decide what will
happen then - if it is started on an other cluster member, etc. .
I have to admit that I haven't very thoroughly thought about this topic
yet, but one thing I have in mind all time is that you should never
polute the network with too many packages that do not need to be
transmitted. So I want to keept the sending of the "health" information
to a mininum.
More information about the Linux-HA