[Linux-HA] Heartbeat link recovery problems - ARP requests for broadcast IP
klaus.mailinglists at pernau.at
Tue Aug 2 02:15:27 MDT 2011
The cluster consist of 2 nodes: db1 and db2 using Squeeze backports
(heartbeat 1:3.0.4-1~bpo60+1). Heartbeat is configured to use all 3
The network config is:
eth0 10.10.101.7/24 10.10.101.8/24
eth1 192.168.0.1/24 192.168.0.2/24
eth2 192.168.1.1/24 192.168.1.2/24
Then, for tests, I shut down eth2 on db1: ifdown eth2
After some time both sides detected the link eth2 to be "dead".
The I re-enabled the network: ifup eth2.
After some time, the cluster on db1 reported all links OK but on db2 the
eth2 link was still "dead".
The problem is, that the heartbeat from db1 to db2 was net sent, as db1
tried to ARP-resolve 192.168.1.255: When I sniffed on eth2 I saw ARP
requests: "Who has 192.168.1.255? tell 192.168.1.1."
It is of course total crap to send ARP requests for the broadcast IP
address. The network configuration is correct. I have no idea if this is
a bug in heartbeat or in the Linux network stack. I also tried sending
PINGs to the broadcast IP address during ifdown/ifup and PING's behavior
was correct - thus maybe it is a bug in heartbeat.
Also heartbeat did not recover (I tried several minutes). I had to stop
heartbeat on db1, ifdown eth2, ifup eth2, start heartbeat to resolve the
Any ideas what is going wrong?
More information about the Linux-HA