[Linux-HA] Is this heartbeat behaviour correct ?
Alan Robertson
alanr at unix.sh
Mon Aug 8 18:54:11 MDT 2005
Boris Berger wrote:
> I don't think there is any poblem with the configuration of the
> heartbeat messages broadcasting on eth0 and eth1, we have tested it
> again by replacing "," by a space or tab (bcast eth1 eth0) and by
> inverting eth0 and eth1, and it's OK (messages are broadcasted on the
> two interfaces).
>
> But the problem still exists : when we disconnect the three network
> cables (the "internal" link between the two nodes AND the two cables
> from the nodes to the "external" LAN), both nodes take back the
> service.
>
> Is it normal ? I thought IPFAIL should prevent from this : ipfail (on
> both nodes) pings an external host which is no more reachable as the
> cables are disconnected.
First of all... ipfail cannot fail over to the other node unless it can
communicate with it. IIRC, you have deliberately caused multiple
failures to make this impossible - in other words, to create a "split
brain" situation.
Secondly, ipfail only fails over when one side has better connectivity
than the other side. When the two sides can't communicate with each
other, then they can't compare to see who has better connectivity.
Thirdly, if you really want to perform this kind of test, you need to
enable STONITH - which will certainly keep both sides from taking over
at the same time. You may not like what it does in this circumstance
(rolling reboots), but it is safe, and your data is safe.
--
Alan Robertson <alanr at unix.sh>
"Openness is the foundation and preservative of friendship... Let me
claim from you at all times your undisguised opinions." - William
Wilberforce
More information about the Linux-HA
mailing list