[Linux-HA] migration/fence after fail-count > X
sebastia at l00-bugdead-prods.de
Tue Nov 13 07:38:14 MST 2007
Andrew Beekhof <beekhof at gmail.com> wrote:
> On Nov 13, 2007, at 1:02 PM, Sebastian Reitenbach wrote:
> > Hi,
> > I read in the v2 FAQ the following:
> > What happens when monitor detects the resource down?
> > The node will try to restart the resource, but if this fails, it
> > will fail
> > over to an other node.
> > A feature that allows failover after N failures in a given period of
> > time is
> > planned.
> > Is that feature still planned?
> thats how it works already - sort of.
> there is a layer of indirection with resource-failcount-stickiness,
> but basically once failcount hits a threshold - the resource moves.
> knowing what to set resource-failcount-stickiness to can be tricky.
> one of the easiest, i can turn my brain off, ways is:
> 1) to start the cluster and make sure everything is running
> 2) figure out the current score (see conversations regarding the
> getscores.sh script that has been posted here)
Ah, I need to look for that.
> 3) divide said score by X and add 1
> > Could it also be instead of failover, fence the node X when
> > failcount > X?
> no, at least not yet anyway
> interesting idea though
I think that would be a viable option for resources that could get damaged
or produce confusion, when started multiple times in a cluster, e.g. Xen
domU's, non cluster aware Filesystems, IP addresses...
> > Or is that working already, and the FAQ is not upated?
> > At least when I see this:
> > http://www.linux-ha.org/v2/faq/forced_failover
> > It seems to work already, but only in combination with moving a
> > resource to
> > another location, but not to be used to fence a node after a critical
> > fail-count is reached.
> > I've seen the fail_count utility, and tried to find examples on the
> > webpage,
> > but that search was not too exhaustive.
> > Also, can the fail-count of different resources be summed up to make a
> > decision in combination with fencing? E.g. Resources A, B, C...
> > The failcount of A=3, + B=4 = SUM=7 > 6, then fecnce the node where
> > that
> > limit is reached.
> as above. not at the moment
Thanks for the input. I'll open some enhancement requests in the bugzilla
later today for the two not possible ways.
More information about the Linux-HA