[Linux-ha-dev] Topology issues in fencing

Alan Robertson alanr at unix.sh
Mon Jul 26 20:21:20 MDT 2004


Hi,

After talking to Huang, Zhen and Sun, Jiang Dong, I read this page:
	http://wiki.trick.ca/linux-ha/LocalResourceManager/StonithAgents


I had my doubts about this in the first place, but the actual desription 
here suggested has made it clear that this is a broken idea...

If they parameters and types don't match the existing model, then it 
doesn't fit the existing model, and shouldn't be force-fit.  So, it's a 
mistake to try and fit it into the resource model.

And, since the STONITH API is 'C' based, and the resource manager APIs are 
optimized for shell access, it also doesn't make any sense from this 
perspective either...

You can see this from the fact that the proposal is always changing the 
return types of everything so that they return the wrong type compared to 
resource agents.

And, it asks the resource agents to have memory from call to call (which 
otherwise they never have to have), which is also quite contradictory to 
normal resource agent behavior.

To me, this looks like a case of "if the only tool you have is a hammer, 
then everything looks like a nail"...

So, I have asked the Huang, Zhen and Sun, Jiang Dong not to implement this 
until we can discuss this some more -- because I really think this is an 
architectural mistake.

It has also become apparent in discussions at OLS that there is great value 
to having an independent fencing layer that we can use, and that the kernel 
can also use...  And, this architecture won't do that either...

So, I believe we need a separate fencing service which manages the topology 
issues rather than making the user deal with these issues...

A few more comments on this, which I expect that Andrew probably already 
knows...

Due to underlying hardware architecture issues, every STONITH device cannot 
  be accessed from every machine.

And, because many STONITH devices are smart and maintain their own 
configurations, you have to ask them which nodes they can reset.

So, we have a somewhat complex topology issue here...

You have to load *all* the configured STONITH devices on all machines, and 
then after they are all loaded on all machines, you ask each of them which 
nodes they can reset.

But, it is my belief that these issues need to be handled by a fencing 
layer.  Note that the fencing layer can't depend on anything like 
membership or group services...


-- 
     Alan Robertson <alanr at unix.sh>

"Openness is the foundation and preservative of friendship...  Let me claim 
from you at all times your undisguised opinions." - William Wilberforce



More information about the Linux-HA-Dev mailing list