Patch to add stonith support to multiple apcmasterswitches <apcmultiplemastersnmp>

Marc Grimme grimme@atix.de
15 Oct 2002 14:45:23 +0200


Hi Lars,
On Tue, 2002-10-15 at 12:51, Lars Marowsky-Bree wrote:
> Hi Marc, I am not too happy with the approach taken by your patch (even t=
hough
> it does add good functionality).
I thought I would raise some discussion about that. Besides I agree with
you, but in my opinion the problem (of two apc powerswitches) was to be
solved more easily by this approach (the pragmatic one).
>=20
> I believe it would be the strategically better approach to not enhance a
> particular STONITH module only, but instead enhance the stonith library i=
tself
> to detect if it has more than one STONITH device which can reset a given =
node
> and then take appropriate action; this would make the feature available t=
o all
> modules. (And also allow different devices to be used, so you don't depen=
d on
> a single device type and thus increase availability)
I agree completely but the efford in my opinion to extend the module
would follow up some more issues that were no worse it:
First I didn't want to change to much existing code (because I wasn't
working on the CVS-tree) and didn't want to come over with a changing of
parts of the framework without pointintime source. And second I couldn't
find any place to add such functionality especially when using
heartbeat-stonith as part of sgi-failsafe (cause that was my first
approach also). Because the only interface failsafe communicates with
the stonith module is by -t and -p as far as I could figure it out. And
-p goes directly to the functions of the library of the stonith-module
and so we would have to change any existing lib and we are where I am
now :-(. Perhaps I should have a look at the actual CVS-Tree.
>=20
> However, at the same time, I'd like to raise the point that statistically=
, I
> think using two STONITH devices for a given node is a bad choice. Let me
> explain why.
>=20
> There are two ways how the STONITH devices could be wired; either paralle=
l or
> serial.=20
>=20
> If they are wired in parallel, you'd need to reset both of them; this imp=
lies
> that you have to reach both, effectively doubling your exposure to networ=
king
> failures. So this one is out.
If you have to choose what kind of configuration you should always take
the serial one to be preferred. Theoretically.

But in my opinion we cannot do so. As far as I understand it the
parallel configuration look as follows:
            |                    |
            |                    |
            |                    |
Network1---NPS1                NPS2---Network2
            |                    |
        XXXXXXXXXXXXXXXXXXXXXXXXXXXX
        X PS1     HOST         PS2 X
        XXXXXXXXXXXXXXXXXXXXXXXXXXXX
p(reaching)=3D p(reach NPS)^2

Serial: ??????
How would you configure that? With two phases with two different clock
pulses?

I think it is more a discussion on what kind of drawbacks you would like
to have.
All that stuff _ONLY_ holds for the apc nps.
If you choose for two independent nps (in parallel) you have a poorer
propability of reaching booth at the same time (p^2) for stonithing. And
that's it. I personally think there are issues which should also be
discussed. These are user errors. i.e. someone plugs out one power
circuit by pulling the wrong one in the rack or someone clicks on the
wrong button to reset another host. Or whatever some people do. All I
want to add to the discussion is that we should not forget the user.=20
Ideally I completely agree with you.

But if you use apc nps in my opinion there are configurations where it
is better to use two or more. Perhaps not for WTI. But that holds that
it should be left in the apc module.

And last but not least you cannot use the apc-switches in seriell so the
logical error you see does not hold for apcs. Or did I get it wrong. I
must go through the list and reset every single apc. If you have apcs
you'll need a parallel setup. I never said that this is a generic
aproach that holds for every powerswitch but for apcs.

P.S. At least I learned why the wti nps is a lot better then apcs.
Regards marc.
--=20

Gruss/Regards,

Marc Grimme

**
ATIX - Gesellschaft f=FCr Informationstechnologie und Consulting mbH
Einsteinstrasse 10
D-85716 Unterschleissheim
Tel.: +49-89 31 78 7 42 4
http://www.atix.de/            http://www.san-time.org/