AW: [LinuxFailSafe] startup of resource groups with one node down

Martin Bene martin.bene@icomedias.com
Mon, 6 May 2002 19:51:52 +0200


> -----Urspr=FCngliche Nachricht-----
> Von: Lars Marowsky-Bree [mailto:lmb@suse.de], 06. Mai 2002 19:09


> > My question is: do you have to bring up all nodes in the=20
> > cluster just=20
> > for FailSafe to continue operating normally?  A person=20
> > maintaining a=20
> > cluster will have to know this...
>=20
> Yes. You need to bring up the majority of the cluster; at=20
> least that is my
> understanding.

For a two-node cluster, that's what the tie-breaking node in the =
ha-parameters should be for; if only one node is available, it must be =
the tie-breaking node for a valid membership to be formed. I'd expect =
having to reset the tie-breaker parameter to be the available node in =
case I try to start up with just one node.

Sadly, things still don't work out quite as expected:

* If I start out with both nodes available and shut one node down:

Node Status is 1xUP, 1xDOWN; resource groups fail over to the available =
node; rescource groups CAN be taken offline and set online again without =
problems.

* If I start with just one node, which is the tiebreaker node:

Node Status is 1x UP, 1xDOWN; resource startup is attempted but fails. =
resource groups end up in ERROR state, can be taken offline but can NOT =
be set online manually. manual startup fails:

	Node in failure domain is not in membership. FailSafe
	daemon (ha_fsd) failed to online resource group (db).

or , with a bit more info from the logfiles:

Mon May  6 19:33:23.166 <N resgroupAdmin config 1964:0 =
ci_config_cdb.c:208> resgroupAdmin _RESOURCE_GROUP=3Ddb =
_RESOURCE_GROUP_ACTION=3Donline _CLUSTER=3Dwebc
Mon May  6 19:33:53.592 <N resgroupAdmin config 1964:0 =
resgroupAdmin.c:853> Node in failure domain is not in membership
Mon May  6 19:33:53.618 <E resgroupAdmin config 1964:0 =
ci_config_cdb.c:232> CI_FAILURE, CLI private command: failed (FailSafe =
daemon (ha_fsd) failed to online resource group (db).)

Somehow, I can't imagine that that's expected/desired behaviour :-)

Bye, Martin