[Linux-ha-dev] ipfail nodestatus callback
Guochun Shi
gshi at ncsa.uiuc.edu
Mon Dec 13 14:44:18 MST 2004
At 06:56 PM 12/13/2004 +0100, you wrote:
>Hi HA-team,
>
>I'm testing the new heartbeat release 1.2.3 from Debian archives
>(sarge). I was testing ipfail with ping nodes. My conf is at the end
>of this mail. My two cluster nodes are sarge-116 & sarge-226.
>
>I simulate the fact that sarge-116 can't no more see ref hosts,
>doing on the ref hosts: iptables -A INPUT -p icmp -s sarge-116 -j DROP
>
>That way the ref hosts don't see ping requests from sarge-116.
>
>So sarge-226 see ping nodes but not sarge-116. The ha-log on sarge-116
>tells:
>
>-- CUT HERE --
>Dec 13 17:19:41 localhost heartbeat[1305]: WARN: node referencehost: is
>dead
>Dec 13 17:19:41 localhost ipfail[1317]: info: Status update: Node
>referencehost now has status dead
>Dec 13 17:19:41 localhost ipfail[1317]: info: NS: We are dead. :<
>-- CUT HERE --
>
>So sarge-116 does not see no more the ref hosts and should release
>resources. But nothing is done.
>
>I noticed with many tests that when there is a "LinkStatus update" then
>the two nodes (sarge-116 & sarge-226) exchange there ping node number
>and takeover can happen if one see more nodes than the other.
>
>But in our simple case it's not a LinkStatus update but a "Status
>update" that has happened.
>
>So i've taken a look in ipfail sources and something appears strange:
>
>The code for LinkStatus callback looks like:
>
>if( DEAD )
>{
> if( ping_node_status() )
> ...
> else
> {
> log( we are dead )
> ask_ping_nodes()
> }
>}
>
>But in NodeStatus callback it looks like:
>
>if( DEAD )
>{
> if( ping_node_status() )
> ....
> else
> {
> log( we are dead )
>--> SEE COMMENT <---
> }
>}
>else if( PINGSTATUS )
>{
> log( ping node came up )
> ping_node_status()
> ask_ping_nodes()
>}
>
>--> COMMENT <---
>
>Why there is no "ask_ping_nodes()" here.
>
>It could force the node (sarge-116) to know at that time that
>the other node (sarge-226) see more nodes than it and thus leads to a
>takeover which will be great.
>
>This is done in "linkStatus update" why not here ?
>
>What's the difference between this two callbacks (seems one for
>"network" the other for "interface") ? Is there a special meaning ?
>
>Thanks in advance for your answer.
>
>Regards,
>
>stephane
>
>-- CUTE HERE ha.cf --
>debugfile /var/log/ha-debug
>logfile /var/log/ha-log
>logfacility local0
>
>keepalive 2
>deadtime 10
>warntime 8
>initdead 120
>
>serial /dev/ttyS0
>
>auto_failback on
>
>node sarge-116
>node sarge-226
>
>ping_group referenceHost 192.168.31.230 192.168.31.102
>deadping 5
>
>respawn hacluster /usr/lib/heartbeat/ipfail
>-- CUTE HERE --
>
>-- CUTE HERE haresources --
>sarge-116 192.168.31.202
>sarge-226
>-- CUTE HERE --
This does not look correct to me. Since sarge-226 is not primary node for any resource, you should not put it there.
and there seems to a bug when deadping is greater (less?) than deadtime, which is fixed in CVS head.
you may want to search archives. Kevin can explain in more detail
(by the way, anyone know where I can find the format to report a problem in wiki?)
-Guochun
More information about the Linux-HA-Dev
mailing list