[Linux-ha-dev] Topology issues in fencing
Alan Robertson
alanr at unix.sh
Mon Jul 26 20:21:20 MDT 2004
Hi,
After talking to Huang, Zhen and Sun, Jiang Dong, I read this page:
http://wiki.trick.ca/linux-ha/LocalResourceManager/StonithAgents
I had my doubts about this in the first place, but the actual desription
here suggested has made it clear that this is a broken idea...
If they parameters and types don't match the existing model, then it
doesn't fit the existing model, and shouldn't be force-fit. So, it's a
mistake to try and fit it into the resource model.
And, since the STONITH API is 'C' based, and the resource manager APIs are
optimized for shell access, it also doesn't make any sense from this
perspective either...
You can see this from the fact that the proposal is always changing the
return types of everything so that they return the wrong type compared to
resource agents.
And, it asks the resource agents to have memory from call to call (which
otherwise they never have to have), which is also quite contradictory to
normal resource agent behavior.
To me, this looks like a case of "if the only tool you have is a hammer,
then everything looks like a nail"...
So, I have asked the Huang, Zhen and Sun, Jiang Dong not to implement this
until we can discuss this some more -- because I really think this is an
architectural mistake.
It has also become apparent in discussions at OLS that there is great value
to having an independent fencing layer that we can use, and that the kernel
can also use... And, this architecture won't do that either...
So, I believe we need a separate fencing service which manages the topology
issues rather than making the user deal with these issues...
A few more comments on this, which I expect that Andrew probably already
knows...
Due to underlying hardware architecture issues, every STONITH device cannot
be accessed from every machine.
And, because many STONITH devices are smart and maintain their own
configurations, you have to ask them which nodes they can reset.
So, we have a somewhat complex topology issue here...
You have to load *all* the configured STONITH devices on all machines, and
then after they are all loaded on all machines, you ask each of them which
nodes they can reset.
But, it is my belief that these issues need to be handled by a fencing
layer. Note that the fencing layer can't depend on anything like
membership or group services...
--
Alan Robertson <alanr at unix.sh>
"Openness is the foundation and preservative of friendship... Let me claim
from you at all times your undisguised opinions." - William Wilberforce
More information about the Linux-HA-Dev
mailing list