[Linux-HA] heartbeat 1.2.3, ipfail will not lead to an failover if ping is not reachable

Patrick Roßbach patrick.rossbach at alcatel.de
Fri Aug 19 09:33:53 MDT 2005


Hi all high availables,

I've a question about ipfail's ping:
I thought I can use the 'ping' option to observe availability of a 
connection to a node heartbeat is not running on. And if the active 
server is not able to reach that node anymore, but the passive one is, a 
fail over will happen. I thought I've seen this behavior in the past, 
but when I try out today (and yesterday) there was no fail over.

I tried to unplug the wire on NIC of my active server and also to lock 
the interface (eth2) using 'iptables' (for 20 secs). The servers are 
still able to exchanges heartbeats (over ttyS01 and eth3). In the logs I 
can see that ipfail detects that the node (rbc_ce0_sw0) is dead, but I 
don't know what's missing to get my expected fail over??

Thanks for any help.

Regards,
  Patrick

By the way: After all this, it is not possible to end heartbeat commonly 
(/etc/init.d/heartbeat stop) without a 'killall ipfail'.


There's the relevant part of syslog: (ERRORs are because of iptables' DROP)
------------------------------------
Aug 19 17:02:39 oms1 heartbeat: [7157]: ERROR: glib: Error sending 
packet: Operation not permitted
Aug 19 17:02:39 oms1 heartbeat: [7157]: ERROR: write failure on ping 
rbc_ce0_sw0.: Operation not permitted
Aug 19 17:02:40 oms1 heartbeat: [7157]: ERROR: glib: Error sending 
packet: Operation not permitted
Aug 19 17:02:40 oms1 heartbeat: [7157]: ERROR: write failure on ping 
rbc_ce0_sw0.: Operation not permitted
Aug 19 17:02:41 oms1 heartbeat: [7157]: ERROR: glib: Error sending 
packet: Operation not permitted
Aug 19 17:02:41 oms1 heartbeat: [7157]: ERROR: write failure on ping 
rbc_ce0_sw0.: Operation not permitted
Aug 19 17:02:42 oms1 heartbeat: [7157]: ERROR: glib: Error sending 
packet: Operation not permitted
Aug 19 17:02:42 oms1 heartbeat: [7157]: ERROR: write failure on ping 
rbc_ce0_sw0.: Operation not permitted
Aug 19 17:02:43 oms1 heartbeat: [7157]: ERROR: glib: Error sending 
packet: Operation not permitted
Aug 19 17:02:43 oms1 heartbeat: [7157]: ERROR: write failure on ping 
rbc_ce0_sw0.: Operation not permitted
Aug 19 17:02:44 oms1 heartbeat: [7148]: WARN: node rbc_ce0_sw0: is dead
Aug 19 17:02:44 oms1 heartbeat: [7148]: info: Link 
rbc_ce0_sw0:rbc_ce0_sw0 dead.
Aug 19 17:02:44 oms1 ipfail: [7160]: info: Status update: Node 
rbc_ce0_sw0 now has status dead
Aug 19 17:02:44 oms1 heartbeat: [16237]: debug: notify_world: setting 
SIGCHLD Handler to SIG_DFL
Aug 19 17:02:44 oms1 heartbeat: info: Running /etc/ha.d/rc.d/status status
Aug 19 17:02:44 oms1 heartbeat: [7157]: ERROR: glib: Error sending 
packet: Operation not permitted
Aug 19 17:02:44 oms1 heartbeat: [7157]: ERROR: write failure on ping 
rbc_ce0_sw0.: Operation not permitted
Aug 19 17:02:45 oms1 heartbeat: [7157]: ERROR: glib: Error sending 
packet: Operation not permitted
Aug 19 17:02:45 oms1 heartbeat: [7157]: ERROR: write failure on ping 
rbc_ce0_sw0.: Operation not permitted
Aug 19 17:02:46 oms1 heartbeat: [7157]: ERROR: glib: Error sending 
packet: Operation not permitted
Aug 19 17:02:46 oms1 heartbeat: [7157]: ERROR: write failure on ping 
rbc_ce0_sw0.: Operation not permitted
Aug 19 17:02:47 oms1 heartbeat: [7157]: ERROR: glib: Error sending 
packet: Operation not permitted
Aug 19 17:02:47 oms1 heartbeat: [7157]: ERROR: write failure on ping 
rbc_ce0_sw0.: Operation not permitted
Aug 19 17:02:48 oms1 heartbeat: [7157]: ERROR: glib: Error sending 
packet: Operation not permitted
Aug 19 17:02:48 oms1 heartbeat: [7157]: ERROR: write failure on ping 
rbc_ce0_sw0.: Operation not permitted
Aug 19 17:02:49 oms1 heartbeat: [7157]: ERROR: glib: Error sending 
packet: Operation not permitted
Aug 19 17:02:49 oms1 heartbeat: [7157]: ERROR: write failure on ping 
rbc_ce0_sw0.: Operation not permitted
Aug 19 17:02:50 oms1 heartbeat: [7148]: info: all clients are now resumed
Aug 19 17:02:50 oms1 heartbeat: [7157]: ERROR: glib: Error sending 
packet: Operation not permitted
Aug 19 17:02:50 oms1 heartbeat: [7157]: ERROR: write failure on ping 
rbc_ce0_sw0.: Operation not permitted
Aug 19 17:02:50 oms1 ipfail: [7160]: info: NS: We are dead. :<
Aug 19 17:02:50 oms1 ipfail: [7160]: info: Link Status update: Link 
rbc_ce0_sw0/rbc_ce0_sw0 now has status dead
Aug 19 17:02:50 oms1 ipfail: [7160]: info: We are dead. :<
Aug 19 17:02:50 oms1 ipfail: [7160]: info: Asking other side for ping 
node count.
Aug 19 17:02:50 oms1 ipfail: [7160]: debug: Message [num_ping] sent.
Aug 19 17:02:51 oms1 heartbeat: [7157]: ERROR: glib: Error sending 
packet: Operation not permitted
Aug 19 17:02:51 oms1 heartbeat: [7157]: ERROR: write failure on ping 
rbc_ce0_sw0.: Operation not permitted
Aug 19 17:02:52 oms1 heartbeat: [7157]: ERROR: glib: Error sending 
packet: Operation not permitted
Aug 19 17:02:52 oms1 heartbeat: [7157]: ERROR: write failure on ping 
rbc_ce0_sw0.: Operation not permitted
Aug 19 17:02:53 oms1 heartbeat: [7157]: ERROR: glib: Error sending 
packet: Operation not permitted
Aug 19 17:02:53 oms1 heartbeat: [7157]: ERROR: write failure on ping 
rbc_ce0_sw0.: Operation not permitted
Aug 19 17:02:54 oms1 heartbeat: [7157]: ERROR: glib: Error sending 
packet: Operation not permitted
Aug 19 17:02:54 oms1 heartbeat: [7157]: ERROR: write failure on ping 
rbc_ce0_sw0.: Operation not permitted
Aug 19 17:02:55 oms1 heartbeat: [7157]: ERROR: glib: Error sending 
packet: Operation not permitted
Aug 19 17:02:55 oms1 heartbeat: [7157]: ERROR: write failure on ping 
rbc_ce0_sw0.: Operation not permitted
Aug 19 17:02:56 oms1 heartbeat: [7157]: ERROR: glib: Error sending 
packet: Operation not permitted
Aug 19 17:02:56 oms1 heartbeat: [7157]: ERROR: write failure on ping 
rbc_ce0_sw0.: Operation not permitted
Aug 19 17:02:57 oms1 heartbeat: [7157]: ERROR: glib: Error sending 
packet: Operation not permitted
Aug 19 17:02:57 oms1 heartbeat: [7157]: ERROR: write failure on ping 
rbc_ce0_sw0.: Operation not permitted
Aug 19 17:02:58 oms1 heartbeat: [7157]: ERROR: glib: Error sending 
packet: Operation not permitted
Aug 19 17:02:58 oms1 heartbeat: [7157]: ERROR: write failure on ping 
rbc_ce0_sw0.: Operation not permitted
Aug 19 17:02:59 oms1 heartbeat: [7148]: info: Link 
rbc_ce0_sw0:rbc_ce0_sw0 up.
Aug 19 17:02:59 oms1 heartbeat: [7148]: WARN: Late heartbeat: Node 
rbc_ce0_sw0: interval 21030 ms
Aug 19 17:02:59 oms1 heartbeat: [7148]: info: Status update for node 
rbc_ce0_sw0: status ping
Aug 19 17:02:59 oms1 ipfail: [7160]: info: Link Status update: Link 
rbc_ce0_sw0/rbc_ce0_sw0 now has status up
Aug 19 17:02:59 oms1 ipfail: [7160]: info: Status update: Node 
rbc_ce0_sw0 now has status ping
Aug 19 17:02:59 oms1 ipfail: [7160]: info: A ping node just came up.
Aug 19 17:02:59 oms1 ipfail: [7160]: debug: Found ping node rbc_ce0_sw0!
Aug 19 17:02:59 oms1 ipfail: [7160]: info: Asking other side for ping 
node count.
Aug 19 17:02:59 oms1 ipfail: [7160]: debug: Message [num_ping] sent.


ha.cf:
------
logfacility     local0
keepalive 1
deadtime 5
warntime 3
initdead 10
udpport 694
baud    19200
serial  /dev/ttyS1
bcast   eth3
auto_failback off
node    oms0
node    oms1
ping rbc_ce0_sw0
respawn hacluster /usr/lib/heartbeat/ipfail


haresources:
------------
oms0    IPaddr::192.168.192.1/24 myService mon


-- 
Patrick Rossbach     +-------V-------+ mailto:patrick.rossbach at alcatel.de
Alcatel SEL AG (ext) | A L C A T E L | Phone : +49 30  7002 4742
Colditzstr. 34-36    +---------------+ Fax   : +49 30  7002 3669
D-12099 Berlin             S E L

-------------- next part --------------
A non-text attachment was scrubbed...
Name: patrick.rossbach.vcf
Type: text/x-vcard
Size: 289 bytes
Desc: not available
Url : http://lists.community.tummy.com/pipermail/linux-ha/attachments/20050819/9083cb44/patrick.rossbach.vcf


More information about the Linux-HA mailing list