Ethernet/Serial Heartbeat question

Andre Bonhote anbonhote@coltinternet.ch
Mon, 28 Oct 2002 10:03:35 +0100


Hello!

I am fairly new to this list, to be honest, I just subscribed two
minutes ago, and didn't receive a single mail until now.

I have a question concerning the heartbeat program. I am sure someone can
help. On the Linux-HA-Website, I didn't find this information.

Some two months ago, I finished building an NFS server cluster with
shared storage on an EMC over Fibre with two machines, one acting as
hot-standby failover. I connected the two with both, a serial and a
crossover ethernet cable.

During the testing phase, all worked fine (I resolved some weird NFS
issues ...) . After finishing the docs, I had to go to the military
service for four weeks. The first week, my team went live with the
platform.

Now, one day, they decided to clean up the rack with the servers, and
one removed the RED heartbeat ethernet cable. The serial line was left
untouched. But still, the sleeping machine woke up and took over the
storages, mounting it read/write. You can imagine how the systems looked
like after my green holidays! Complete mess! We don't have STONITH.

So, that's my story, and here's my question regarding heartbeat:

When exactly does heartbeat think the master system is down? I mean, the
second (serial) heartbeat cable was still there. Why did the failover
machine take over?

Right now, the second system is down, but by this evening, they want me
to turn it on again (it's my first working day after being off two
weeks, how fair ...). I would really appreciate information on this, if
possible. In the meantime, I will just browse the sourcecode and the
docs ...

Thank you very much in advance!

Greetings from Switzerland

André
-- 
Real programmers do "cp /dev/audio a.out" and whistle into the mike.
                                                (Randal L. Schwartz)