[Linux-HA] Recovering from "unexpected bad things" -
is STONITH the answer?
matilda at grandel.de
Wed Nov 7 03:42:36 MST 2007
>>> Andrew Beekhof <beekhof at gmail.com> 07.11.2007 09:59 >>>
>I'd argue that it is exactly these situations where ssh is _better_
Probably several aspects are mixed in this thread. If it's reduced
to that question I can't give a qualified comment. But it seems meaningful
that a crazy running node is so crazy that it can't kill itself. :-))
>> - Emphasize the necessity of an external properly configured stonith
> wiki.linux-ha.org :-)
Why did I knew that you would answer this way? :-)) (I had to laugh
really loud reading your answer here).
>> RA will NEVER happen if stonith is not configured.
>I'm pretty sure you get this by setting on_fail=block
>And actually if a stop fails and stonith is not enabled you end up
Another piece of valuable information. Spiced up with an use case
it becomes really interesting. ;-)
>But thats a different scenario to fencing in response to a node-level
Yes, I mixed it up a bit.
> Implementing suicide guarantees that there will be downtime.
Yes, that's true.
I just wanted to emphasize that it should never ever come to
this situation. It should be treated as an extreme situation.
So, to grab the methaphor from above: Someone has to decide
up to which level of node-crazyiness the crazy node can heal
itself or from which level of crazyness you just can get rid
of that node. Is it feasible to find that point on the range
of node-crazyness. Back to normal speech: Can someone really
take every possible cause of node-failure into account.
More information about the Linux-HA