[Linux-ha-dev] [rfc] SBD with Pacemaker/Quorum integration
lmb at suse.com
Fri May 25 09:41:02 MDT 2012
On 2012-05-25T17:31:52, Florian Haas <florian at hastexo.com> wrote:
> > That aside, what do you think of the idea/approach?
> Um, right now I have no opinion. Your commit messages are pretty
> terse, and there's no README in the repo. Mind adding one?
Good point. I wasn't aware the commit messages were terse ;-)
To sketch this out:
Basically though SBD continues as it always did.
If you specify "-P" to the daemon start-up (usually via
/etc/sysconfig/sbd SBD_OPTS), the following will happen:
sbd will start (in addition to the worker processes that monitor the
disks) a process that signs in with pacemaker (and corosync). This
process monitors that the partition the local node is part of is
quorate, and that the local node (according to the CIB as run through
pengine) is "healthy".
If so, the master thread will not self-fence even if the majority of
devices is currently unavailable.
That's it, nothing more. Does that help?
It became needed because customers had scenarios with just one device
(which experienced intermittent problems), where MPIO acted up (I've
seen IO stuck for minutes), or even three devices where failures were
correlated. Then, SBD would self-fence, and the customer be unhappy.
(I have opinions on particularly the last failure mode. This seems to
arise specifically when customers have build setups with two HBAs, two
SANs, two storages, but then cross-linked the SANs, connected the HBAs
to each, and the storages too. That seems to frequently lead to
hiccups where the *entire* fabric is affected. I'm thinking this
cross-linking is a case of sham redundancy; it *looks* as if makes
things more redundant, but in reality reduces it since faults are no
longer independent. Alas, they've not wanted to change that.)
SUSE LINUX Products GmbH, GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer, HRB 21284 (AG Nürnberg)
"Experience is the name everyone gives to their mistakes." -- Oscar Wilde
More information about the Linux-HA-Dev