[Linux-HA] Strange behaviour in Heartbeat ?
alanr at unix.sh
Thu Aug 11 10:09:02 MDT 2005
Fabrice Durand wrote:
> Hello all,
> First, thank you for your answers about the "split-brain" situation.
> Here is a new problem. I have tested a 2 node active/passive cluster
> with Heartbeat_1.2.3-1woody.
> The haresource file is :
> EEPCLU1 EEPSERV1 apache MailTo::toto at toto.fr::ServiceApache
> <mailto::toto at toto.fr::ServiceApache>
> The 2 nodes broadcast heartbeats via 2 ports eth0 ("external LAN) and
> eth1 (direct connexion
> between the 2 nodes).
> To check the connection of each node in the external LAN,
> ipfail is active with a ping towards a third machine, as
> specified in ha.cf <http://ha.cf> file :
> respawn hacluster /usr/lib/heartbeat/ipfail
> ping theThirdMachine
> Before performing the tests, we have the initial situation :
> - Heartbeat is running on both nodes,
> - one service (apache) is running on node 1,
> - no service is running on node 2,
> as specified the haresource file :
> node1 addrIpServ1 apache
> Now I simply cut the connection between node 1 and the third machine...
> Then there is a strange behaviour :
> - first, node 2 wants to go standby while he already has no service and
> is now supposed to acquire resources of node 1
> - node 1 is trying to start Apache, while Apache is already running on
> node 1 and node 1 is supposed to shutdown Apache.
> Luckily there is a failure in this starting Apache.
> - Even if the starting Apache on node 1 has failed, node 1 is
> successfully sending a mail to say that Apache has just started on node 1 !
> - in the same time, node 2 is trying to shutdown Apache, while no
> process Apache is running on node 2. Then no process is killed.
> - A few seconds later, the expected behaviour is happening : node 1 is
> stopping Apache and get rid of the associated logical IP address
> EEPSERV1; and then node 2 is taking back the logical IP address and
> starting Apache.
> Do you know if this behavior is correct, if there should be a transitory
> phase before the failing over ?
> And if there is a manner to prevent Heartbeat to send an email saying
> that Apache has started, when
> the starting Apache has failed ?
The haresources file is incorrect. You can only put one node name at
the front of a line.
And, I don't think (for reasons I won't go into now) that having the
MailTo as the first resource in the group will work. Try putting a real
resource first on the line.
Alan Robertson <alanr at unix.sh>
"Openness is the foundation and preservative of friendship... Let me
claim from you at all times your undisguised opinions." - William
More information about the Linux-HA