[Linux-HA] Autofailback problem

Colin Bruce ccx004 at coventry.ac.uk
Sun Jun 6 12:33:35 MDT 2004


Dear All,

I seem to have a similar problem to the one posted by Nicolas Schmitz on
the 2nd of June.

I have set up heartbeat version 1.2.2 on two test systems. I have one
service address (192.168.255.7). I have no services - all I am doing is
pinging that address for the moment. The master is called lnxtst2 and the
slave lnxtst3. If I ping the service address I get a response. If I then
shutdown the master cleanly the service address moves to the slave and the
pings continue. If I start the master again the service address moves back
to the master as we would expect. However, it is still on the slave as well.
If I type IPaddr 192.168.255.7 stop on the slave it stops perfectly well so
the question is how come it works if typed by hand but doesn't work if run
from Heartbeat. Actually, the real question is what might I have done wrong.

Here is a section from the logfile as the master tries to take back the
resource. This is taken from the slave. The log file on the master doesn't
show any errors.

heartbeat: 2004/06/06_18:49:50 info: Heartbeat restart on node lnxtst2
heartbeat: 2004/06/06_18:49:50 info: Link lnxtst2:eth0 up.
heartbeat: 2004/06/06_18:49:50 info: Status update for node lnxtst2: status up
heartbeat: 2004/06/06_18:49:50 info: Running /usr/local/packages/heartbeat-1.2.2/etc/ha.d/rc.d/status status
heartbeat: 2004/06/06_18:51:50 WARN: 1 lost packet(s) for [lnxtst2] [61:63]
heartbeat: 2004/06/06_18:51:50 info: Status update for node lnxtst2: status active
heartbeat: 2004/06/06_18:51:50 info: No pkts missing from lnxtst2!
heartbeat: 2004/06/06_18:51:50 info: Running /usr/local/packages/heartbeat-1.2.2/etc/ha.d/rc.d/status status
heartbeat: 2004/06/06_18:51:51 ERROR: Both machines own our resources!
heartbeat: 2004/06/06_18:51:51 info: remote resource transition completed.
heartbeat: 2004/06/06_18:51:51 ERROR: Both machines own our resources!
heartbeat: 2004/06/06_18:51:51 ERROR: Both machines own foreign resources!
heartbeat: 2004/06/06_18:51:51 info: lnxtst3 wants to go standby [foreign]
heartbeat: 2004/06/06_18:51:51 ERROR: Both machines own our resources!
heartbeat: 2004/06/06_18:51:51 ERROR: Both machines own foreign resources!
heartbeat: 2004/06/06_18:52:02 WARN: No reply to standby request.  Standby request cancelled.
heartbeat: 2004/06/06_18:52:02 ERROR: Both machines own our resources!
heartbeat: 2004/06/06_18:52:02 ERROR: Both machines own foreign resources!

The conf files are more or less as distributed. We have added entries to
ha.cf for the nodes and a line in haresources for the service address.

In ha.cf, auto_failback is set to on and the node lines are

node    lnxtst2
node    lnxtst3

In haresources the only line is

lnxtst2     192.168.255.7

Any thoughts would be appreciated.

Best wishes...
Colin Bruce



More information about the Linux-HA mailing list