[Linux-ha-dev] Re: Heartbeat Problem

Alan Robertson alanr@bell-labs.com
Tue, 26 Oct 1999 19:07:14 -0600


Hi Bill,

Thanks for giving heartbeat a try!

> Subject: Heartbeat Problem
> Date: Tue, 26 Oct 1999 15:15:29 -0500
> From: Bill Bacher <bill_bacher@inlet.com>
> 
> Alan,
> 
> We're testing Heartbeat on two machines. Both run Apache Web Servers, and the
> intent is they will offer up the same content, load sharing by IP address only.
> We're using heartbeat to have one carry the full load if the other should die
> for some reason. Both are configured to take over for the other under heartbeat
> control.

OK.  This should work out fine, as long as you have your own mechanism for
synchronizing your web servers data.


> We've tried heartbeat 0.4.5 and 0.4.5a and have seen a strange problem with both
> versions. We're seeing the IP address being re-assigned to the other machine
> without being under heartbeat control. With version 0.4.5, we stopped heartbeat
> altogether and still saw the IP addresses shifting between the two machines.
> Running 0.4.5a today, we're seeing IP transfers but nothing is showing up in the
> heartbeat logs indicating it is controlling things. It's almost as if the fake
> part is running on its own.

What do you mean by shifting on it's own?  Do you mean it's showing up in
ifconfig?
Can you do an ifconfig, save the output, and do another one later, and it has
changed without showing up in the heartbeat (ha-log/ha-debug) logs?
 
> We're running Red Hat 6.0, Kernel 2.2.5-15smp on Dell servers with dual Pentium
> II processors. We installed from the rpm.

Glad somebody does ;-)
 
> Any idea what might be going on? In one of your postings you mention looking at
> 4 logs. What can we be watching besides ha-log, ha-debug, and messages?

Two logs * two machines = 4 logs :-)
> 
> Thanks in advance for your assistance.

This is a new one.  If you've installed the "fake" package, you shouldn't have. 
Heartbeat does it all...

Assuming that's not the case...

I have no idea where this might be happening, so I'll tell you a little about
what is going on so you can verify what part our code might have in it...

Everything related to IP address takeover and giveback takes place in
/etc/ha.d/resource.d/IPaddr.  If it didn't happen there, then heartbeat didn't
do it.

In general, all resource scripts look a lot like /etc/rc.d/init.d
startup/shutdown scripts, except some of them (notably IPaddr) require another
argument.  When you start up apache, you do this: /etc/rc.d/init.d/httpd start. 
To take over an IP address, you do this: /etc/ha.d/resource.d/IPaddr ip-address
start.  To give it up, you do the same thing with "stop" instead of "start". 

If you look at the funciton ip_start(), you'll see that every time we take over
an IP address, a message "INFO: ifconfig...", and a message "Sending Gratuitous
Arp for ..." should occur EVERY time we take over an IP address.  If these don't
occur, then there is an extremely high probability that we didn't perform an IP
address takeover.

Exactly what did happen to you is harder to say...  But it seems unlikely that
we did the dirty deed.

Please let us know what you find out.

	Thanks!!

	-- Alan Robertson
	   alanr@bell-labs.com