[Linux-ha-dev] [RFC] STONITH in the new framework
lmb at suse.de
Fri Jan 16 02:18:20 MST 2004
during our discussions in Nuremberg, we also touched on the topic of
STONITH in the new model. I'd like to share this with you; all flaws in
the model are mine, the good ideas are from lge and Andrew ;-)
The first question is who initiates the STONITH request; however, this
is also the simple one to answer - of course all STONITH requests are
initiated by the CRM on the Designated Coordinator node.
The second question is which node actually performs the STONITH
operation. And this one already is no longer so simple to answer.
The third question is one of monitoring the STONITH controller; the
STONITH API supports 'pinging' the controller and checking it for
health. This is a highly desireable feature - because you only need the
STONITH in case of node failures, an unplugged STONITH cable could
otherwise go unnoticed for long periods, and prevent clean failover.
Also related to monitoring is requesting the list of power outlets from
the STONITH controller, to find out which nodes a given STONITH
controller actually able to reset.
Those two questions are simple to answer in the two-node case; of course
the local node would do that, because the other node has gone down ;-)
(Note that so far, the monitoring operation wasn't actively used by
The second and third question require some careful handling with N node
clusters because of the nature of some (if not actually most) network
power switches; most of them are only able to handle one network session
at a given time. All other attempts to connect to them would either
return an error, or even step on the currently active session and mess
up both the current request as well as the new one.
(Anecdote: FailSafe actually does monitor the reset line, and fell
exactly into this trap. Both nodes tried to monitor the power switch,
but when timing was bad they both got an error.)
There is a good answer to this. Only one node controls a given power
switch; all requests to go to that power switch are funneled through it,
it is responsible for the monitoring. If that node fails, or if it can
no longer talk to the power switch for whatever reason, the controlling
task is re-allocated to another node.
(The current heartbeat stonith_host configuration directive actually
suggests that Alan was thinking along these lines, as it allows us to
find out which node is able to talk to which power switch.)
So, can you already see where this is going? ;-) We allocate a node to
be control the power-switch; this one monitors the power switch and
reports failures until we stop it again. Then no two nodes can step on
eachothers toes and all is well.
Right. This looks a lot like a normal resource, and is actually a
super-set of 'normal' LRM operations - there's just two additional
operations, ie "tell me the list of nodes that power switch can reset"
(which could be reported back with the 'start' result) and "perform a
STONITH operation for nodes A, B, C".
So this is actually my suggestion; fold the STONITH controls into the
LRM to avoid code duplication and get the most mileage out of it.
Even in the CRM design, this yields some simplifications; we mostly get
to treat the STONITH devices/operations as regular resources in our
dependency model. (Of course we must export this in a non-confusing way
to the user, but I mean internally.)
Please tell me what you think.
Lars Marowsky-Brée <lmb at suse.de>
High Availability & Clustering \ ever tried. ever failed. no matter.
SUSE Labs | try again. fail again. fail better.
Research & Development, SUSE LINUX AG \ -- Samuel Beckett
More information about the Linux-HA-Dev