[Linux-HA] Is this heartbeat behaviour correct ?
boris.berger at capgemini.com
Mon Aug 8 06:24:00 MDT 2005
I don't think there is any poblem with the configuration of the heartbeat
on eth0 and eth1, we have tested it again by replacing "," by a space or tab
(bcast eth1 eth0)
and by inverting eth0 and eth1, and it's OK (messages are broadcasted on the
But the problem still exists : when we disconnect the three network cables
(the "internal" link between
the two nodes AND the two cables from the nodes to the "external" LAN), both
nodes take back the service.
Is it normal ? I thought IPFAIL should prevent from this : ipfail (on both
nodes) pings an external host
which is no more reachable as the cables are disconnected.
Thanks a lot,
>From : linux-ha-bounces at lists.linux-ha.org
To : "General Linux-HA mailing list" linux-ha at lists.linux-ha.org
Date : Fri, 05 Aug 2005 22:37:35 -0600
Subject : Re: [Linux-HA] Is this heartbeat behaviour correct ?
> Boris Berger wrote:
> > Thanks for your answer. Are these particular settings in the ha.cf file
> > correct ?
> > # Both nodes broadcast on both network cards (here : eth0 and eth1)
> > bcast eth1,eth0
> > ## communication port : 694 : the nodes broadast towards the whole
> > !
> > udpport 694
> > # Use port 694 for bcast or ucast communications . This is the default
> > # port and the official one registered at the IANA, organisation
> > # for assigning new IP addresses
> > Is there something to change in there or elsewhere ? Here is our
> > ha.cf file :
> > bcast eth1,eth0
> > debugfile /var/log/ha-debug
> > logfile /var/log/ha-log
> > logfacility local0
> > keepalive 2
> > deadtime 10
> > warntime 6
> > initdead 60
> > udpport 694
> > node EEPCLU1
> > node EEPCLU2
> > auto_failback on
> > respawn hacluster /usr/lib/heartbeat/ipfail
> > ping EEPNFS
> > Thanks
> > ---------- Initial Header -----------
> >>From : linux-ha-bounces at lists.linux-ha.org
> > To : "General Linux-HA mailing list"
> > linux-ha at lists.linux-ha.org
> > Cc :
> > Date : Fri, 05 Aug 2005 08:17:53 -0600
> > Subject : Re: [Linux-HA] Is this heartbeat behaviour correct ?
> > Boris Berger wrote:
> >>Hello all,
> >>I have tested a 2 node active/passive Heartbeat cluster.
> >>To check the connection of each node in the external network,
> >>ipfail is active with a ping towards a third machine, as
> >>specified in ha.cf file :
> >>respawn hacluster /usr/lib/heartbeat/ipfail
> >>ping theThirdMachine
> >>Before performing the tests, we have the initial situation :
> >>- Heartbeat is running on both nodes,
> >>- one service (apache) is running on node 1,
> >>- no service is running on node 2,
> >>as specified the haresource file :
> >>node1 addrIpServ1 apache
> >>Now I cut simultaneously :
> >>- the direct connection between the 2 nodes
> >>- the connection between node 1 and the third machine
> >>- the connection between node 2 and the third machine
> >>Then, one can notice in the log that :
> >>- Apache does not stop on node 1
> >>- Apache start on node 2.
> >>So Apache is now running on both nodes.
> >>Now, if I reestablish :
> >>- EITHER the connection between node 1 and the third machine ONLY
> >>- OR the connection between node 2 and the third machine ONLY
> >>then nothing special is happening, so Apache is still running on both
> > nodes.
> >>Do you know is this is a normal behaviour ? And how can this be
> > ?
> > It can most probably be explained as a multiple failure you haven't
> > configured heartbeat to deal with. In other words, a configuration
> > When you restore the direct connection (the only one you are
> > heartbeating over, I strongly suspect), it will restart heartbeat on
> > both sides.
> > If you want that to work, you need to tell heartbeat to send heartbeats
> > over all (both?) interfaces - not just the direct connection.
> I actually didn't think that , was valid as a separator. But, If it
> didn't give you an error, then I guess it must be OK. But, maybe it's
> Could you try it again with a space or tab instead of the "," (comma)?
This message contains information that may be privileged or confidential and is the property of the Capgemini Group. It is intended only for the person to whom it is addressed. If you are not the intended recipient, you are not authorized to read, print, retain, copy, disseminate, distribute, or use this message or any part thereof. If you receive this message in error, please notify the sender immediately and delete all copies of this message.
More information about the Linux-HA