[Linux-HA] Is this heartbeat behaviour correct ?

Alan Robertson alanr at unix.sh
Mon Aug 8 18:54:11 MDT 2005


Boris Berger wrote:
> I don't think there is any poblem with the configuration of the
> heartbeat messages broadcasting on eth0 and eth1, we have tested it
> again by replacing "," by a space or tab (bcast eth1 eth0) and by
> inverting eth0 and eth1, and it's OK (messages are broadcasted on the
>  two interfaces).
> 
> But the problem still exists : when we disconnect the three network
> cables (the "internal" link between the two nodes AND the two cables
> from the nodes to the "external" LAN), both nodes take back the
> service.
> 
> Is it normal ? I thought IPFAIL should prevent from this : ipfail (on
> both nodes) pings an external host which is no more reachable as the
> cables are disconnected.


First of all... ipfail cannot fail over to the other node unless it can 
communicate with it.  IIRC, you have deliberately caused multiple 
failures to make this impossible - in other words, to create a "split 
brain" situation.

Secondly, ipfail only fails over when one side has better connectivity 
than the other side.  When the two sides can't communicate with each 
other, then they can't compare to see who has better connectivity.

Thirdly, if you really want to perform this kind of test, you need to 
enable STONITH - which will certainly keep both sides from taking over 
at the same time.  You may not like what it does in this circumstance 
(rolling reboots), but it is safe, and your data is safe.



-- 
     Alan Robertson <alanr at unix.sh>

"Openness is the foundation and preservative of friendship...  Let me 
claim from you at all times your undisguised opinions." - William 
Wilberforce


More information about the Linux-HA mailing list