Testing Heartbeat
Iñaki Férnandez Villanueva
i+d@ceintec.drago.net
Mon Jan 19 23:51:43 MST 2009
--Boundary-=_nWlrBbmQBhCDarzOwKkYHIDdqSCD
Content-Type: text/plain
Content-Transfer-Encoding: 8bit
Hi,
On Wed, 28 Jul 1999, Alan Robertson wrote:
> > At this moment we are installing and configurating Coda in order to test it for
> > our interests.
>
> I am a little concerned with the robustness of Coda for a production
> environment. I think that you might want to think about using the Network
> Block Device along with mirroring for file replication. You might also look
> into Intermezzo (which will eventually be smaller and faster than Coda).
>
Finally we've finished with our testings on coda. We aren't too satisfied
with the results we've obtained (coda is too general for our interests
-we used Coda for replication-, and not very robust like you said). We've
decided to follow your advices and we're now testing Heartbeat (version
0.4.1).
We have tested Heartbeat with these configurations (with kernel 2.2.10):
a) Debian 2.1 with Red Hat 6.0
b) Debian 2.1 with Debian 2.1
c) Red Hat 6.0 with Red Hat 6.0
with the same result for the whole configurations.
> > We tried to install heartbeat, but we notice that the installation
> > package we'd got was for Red Hat systems, is there someone who had
> > tried to install heartbeat in a Debian system?.
>
> I believe Rudy Pawul has done this, but I'm not 100% sure. You could try
> using alien, or installing from the .tar.gz file. I've CCed Rudy. It is not
> intended to be a redhat-only product. It has been well-tested on SuSE as
> well.
There are some differences between Debian and RedHat (path of rc scripts,
some apps, etc.), so alien doesn't solve the problem. We've used the
.tar.gz file with some modifications: paths, apps used in the scripts
like usleep & initlog that aren't in debian (we have used "sleep" instead
of usleep, inserted "functions" file in /etc/init.d because Debian doesn't
have it, and eliminated "initlog" from "functions" file).
For example default Debian's ifconfig doesn't show eth0:0 with the command line
"ifconfig". It only shows eth0:0 with "ifconfig eth0:0", and because of this
stop-ha doesn't remove eth0:0 interface.
If you are interested in getting the scripts modified let us know
(could you give us Rudy's e-mail?, we want to contact him to compare changes).
We have some doubts, here:
* Should we synchronize the servers?
* When we restart heartbeat in one of the servers, why does we get
the 'ERROR: write failure on tty /dev/ttyS1: -1 vs 69: Input/output
error' message in the other server?. This is the same error that
we get if we disconnect the serial line (although we reconnect and
get the ppp connection) or when a server dies.
* We have noticed that when heartbeat is up all the traffic
between the servers is done through the serial line. As you'll
know, we want to use one of the server as backup server (a replica
of the other server) via rsync or something similar. Do you know a
way to use the network interface to achieve this replication?.
* Sometimes we get the 'node ha1 is dead' message in the ha1 node.
Is it normal? (ha1 is the master node).
We haven't been looking the heartbeat code, so be patient if one of these
questions is trivial or it's answered in it.
For a better comprehension of our doubts here we attach you our logs and
configuration files. We are impatient for your response.
PD: We saw that you were looking for testers for the new heartbeat
release, and we are interested in helping you, so let us know when you
finish it.
--
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.~. Iñaki Fernández Villanueva
DEBIAN/GNU /V\ Josu Abajo Marón
// \\ Javier Ruiz González
SLINK 2.1 /( )\ Computer Engineering Students (UPV)
^^-^^ In practice for CEINTEC (Spain)
www.debian.org Reply your messages to i+d@ceintec.drago.net
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
--Boundary-=_nWlrBbmQBhCDarzOwKkYHIDdqSCD
Content-Type: text/english;
name="ha.cf_ha1"
Content-Transfer-Encoding: 8bit
Content-Description: conf of ha1 (master)
Content-Disposition: attachment; filename="ha.cf_ha1"
#
#
# serial serialportname ...
# Must have one. Two provides redundancy
#
# keepalive seconds-between-heartbeats
# deadtime seconds-to-declare-host-dead
# hopfudge maximum hop count minus number of nodes in config
#
#
# node nodename ... -- must match uname -n
#
serial /dev/ttyS1
watchdog /dev/watchdog
udp eth0
keepalive 2
deadtime 10
hopfudge 1
#
# Only for serial ports. It applies to both PPP/UDP and "raw" ports
baud 19200
#baud 115200
udpport 1001
#
#
#ppp-udp /dev/ttyS1 10.0.0.1 /dev/ttyS2 10.0.0.2
ppp-udp /dev/ttyS1 192.168.1.2
#
# This means run PPP over ports ttyS1 and ttyS2
# Their respective IP addresses are as listed.
# Note that I enforce that these are local addresses. Other addresses
# are almost certainly a mistake.
#
# Tell what machines are in the cluster
node ha1
node ha0
--Boundary-=_nWlrBbmQBhCDarzOwKkYHIDdqSCD
Content-Type: text/english;
name="ha.cf_ha0"
Content-Transfer-Encoding: 8bit
Content-Description: conf of ha0 (slave)
Content-Disposition: attachment; filename="ha.cf_ha0"
#
#
# serial serialportname ...
# Must have one. Two provides redundancy
#
# keepalive seconds-between-heartbeats
# deadtime seconds-to-declare-host-dead
# hopfudge maximum hop count minus number of nodes in config
#
#
# node nodename ... -- must match uname -n
#
serial /dev/ttyS0
#serial /dev/ttyS1
watchdog /dev/watchdog
udp eth0
keepalive 2
deadtime 10
hopfudge 1
#
# Only for serial ports. It applies to both PPP/UDP and "raw" ports
baud 19200
#baud 115200
udpport 1001
#
#
#ppp-udp /dev/ttyS1 10.0.0.1 /dev/ttyS2 10.0.0.2
ppp-udp /dev/ttyS0 192.168.1.1
#
# This means run PPP over ports ttyS1 and ttyS2
# Their respective IP addresses are as listed.
# Note that I enforce that these are local addresses. Other addresses
# are almost certainly a mistake.
#
# Tell what machines are in the cluster
#node ken3
#node kathy
node ha1
node ha0
--Boundary-=_nWlrBbmQBhCDarzOwKkYHIDdqSCD
Content-Type: text/english;
name="ipresources_ha1"
Content-Transfer-Encoding: 8bit
Content-Description: ipresources of ha1
Content-Disposition: attachment; filename="ipresources_ha1"
# This is a sample list of moveable IP resources that belong to each
# machine. I call these IP addresses "service" addresses, since they're
# they're the publicly advertised addresses that clients use to get
# at highly available services.
# Note that you should NOT put in administrative (or permanent) addresses
# in this file. For a hot/standby (non load-sharing) 2-node system
# you will probably only put one system name and IP address in here.
# The name you give the address to is the name of the default "hot"
# system.
#
# We refer to this file when we're coming up, and when a machine is being
# taken over after going down.
# You need to make this right for your installation, then install it in
# /etc/ha.d
#
# The format of this file is:
# nodename IP address
#
# Where the nodename is the name of the node which "normally" owns the
# IP addresses. If this machine is up, it will always have the IP
# addresses it is shown as owning.
#
# Note: You only put in IP addresses which are "service" addresses
# and move from machine to machine during failover. Do not include
# "administrative" or fixed IP addresses.
#
# The string you put in for nodename must match the uname -n name
# of your machine. Depending on how you have it administered, it could
# be a short name or a FQDN.
#
ha1 192.168.1.27
--Boundary-=_nWlrBbmQBhCDarzOwKkYHIDdqSCD
Content-Type: text/english;
name="ipresources_ha0"
Content-Transfer-Encoding: 8bit
Content-Description: ipresources of ha0
Content-Disposition: attachment; filename="ipresources_ha0"
#
# This is a sample list of moveable IP resources that belong to each
# machine. I call these IP addresses "service" addresses, since they're
# they're the publicly advertised addresses that clients use to get
# at highly available services.
# Note that you should NOT put in administrative (or permanent) addresses
# in this file. For a hot/standby (non load-sharing) 2-node system
# you will probably only put one system name and IP address in here.
# The name you give the address to is the name of the default "hot"
# system.
#
# We refer to this file when we're coming up, and when a machine is being
# taken over after going down.
# You need to make this right for your installation, then install it in
# /etc/ha.d
#
# The format of this file is:
# nodename IP address
#
# Where the nodename is the name of the node which "normally" owns the
# IP addresses. If this machine is up, it will always have the IP
# addresses it is shown as owning.
#
# Note: You only put in IP addresses which are "service" addresses
# and move from machine to machine during failover. Do not include
# "administrative" or fixed IP addresses.
#
# The string you put in for nodename must match the uname -n name
# of your machine. Depending on how you have it administered, it could
# be a short name or a FQDN.
#
#amymarie.dr.lucent.com 10.1.1.21
#kathy 10.1.1.23
ha1 192.168.1.27
--Boundary-=_nWlrBbmQBhCDarzOwKkYHIDdqSCD
Content-Type: text/plain;
name="ha-log_ha1"
Content-Transfer-Encoding: 8bit
Content-Description: log of ha1
Content-Disposition: attachment; filename="ha-log_ha1"
1999/08/12_20:11:11 INIT: /etc/ha.d/bin/init-ha starting.
1999/08/12_20:11:28 INIT: /etc/ha.d/bin/init-ha complete.
1999/08/12_20:11:52 Starting serial heartbeat on tty /dev/ttyS1
1999/08/12_20:11:52 UDP heartbeat started on port 1001 interface eth0
1999/08/12_20:11:52 PPP/UDP heartbeat started on port 1001 tty /dev/ttyS1
1999/08/12_20:11:52 Using watchdog device: /dev/watchdog
1999/08/12_20:11:53 PPPd process 1039 started
1999/08/12_20:11:53 INFO: Running /etc/ha.d/rc.d/ip-request ip-request
1999/08/12_20:11:53 node ha1: is dead
# Why I'm dead???????
1999/08/12_20:11:53 node ha0: is dead
# Why ha0 is dead?? (in fact ha0 is alive)
1999/08/12_20:11:53 node ha1: status unknown
1999/08/12_20:11:54 node ha0: status unknown
1999/08/12_20:11:54 INFO: Running /etc/ha.d/rc.d/status status
1999/08/12_20:11:54 INFO: Running /etc/ha.d/rc.d/status status
1999/08/12_20:11:54 INFO: Running /etc/ha.d/rc.d/status status
1999/08/12_20:11:54 INFO: Running /etc/ha.d/rc.d/status status
1999/08/12_20:11:55 Sending Gratuitous Arp for 192.168.1.27 on eth0:0
1999/08/12_20:12:06 node ha0: is dead
#??
1999/08/12_20:12:07 INFO: Running /etc/ha.d/rc.d/status status
1999/08/12_20:12:23 Sending Gratuitous Arp for 192.168.1.27 on eth0:0
#We have the same message (Sending Gratuitous Arp) twice. (¿?)
#Shut down the network (unplug the wire)
#Conect the network
#Stop heartbeat
1999/08/12_20:19:52 SHUTDOWN: in progress.
1999/08/12_20:19:52 SHUTDOWN: complete.
#Start heartbeat
1999/08/12_20:20:06 INIT: /etc/ha.d/bin/init-ha starting.
1999/08/12_20:20:06 Starting serial heartbeat on tty /dev/ttyS1
1999/08/12_20:20:06 UDP heartbeat started on port 1001 interface eth0
1999/08/12_20:20:06 PPP/UDP heartbeat started on port 1001 tty /dev/ttyS1
1999/08/12_20:20:06 Using watchdog device: /dev/watchdog
1999/08/12_20:20:07 PPPd process 1238 started
1999/08/12_20:20:19 node ha0: is dead
1999/08/12_20:20:19 INFO: Running /etc/ha.d/rc.d/status status
1999/08/12_20:20:21 INFO: Running /etc/ha.d/rc.d/ip-request ip-request
1999/08/12_20:20:23 INIT: /etc/ha.d/bin/init-ha complete.
1999/08/12_20:20:52 Sending Gratuitous Arp for 192.168.1.27 on eth0:0
#ha0 stop's heartbeat
1999/08/12_20:20:57 ERROR: EOF in ttygets [/dev/ttyS1]: Success
1999/08/12_20:20:57 ERROR: EOF in ttygets [/dev/ttyS1]: Success
1999/08/12_20:20:57 ERROR: bad packet in should_copy_ring_pkt
1999/08/12_20:20:59 PPPd process 1317 started
1999/08/12_20:20:59 ERROR: write failure on tty /dev/ttyS1: -1 vs 69: Input/output error
#Same error each 2 seconds
1999/08/12_20:21:13 ERROR: write failure on tty /dev/ttyS1: -1 vs 69: Input/output error
#ha0 start's heartbeat
1999/08/12_20:21:14 node ha0 seq restart 1 vs 248
1999/08/12_20:21:14 node ha0: status unknown
1999/08/12_20:21:14 INFO: Running /etc/ha.d/rc.d/status status
1999/08/12_20:21:15 ERROR: write failure on tty /dev/ttyS1: -1 vs 69: Input/output error
# Same error each 2 seconds
1999/08/12_20:21:27 ERROR: write failure on tty /dev/ttyS1: -1 vs 69: Input/output error
1999/08/12_20:21:27 node ha0: is dead
1999/08/12_20:21:27 INFO: Running /etc/ha.d/rc.d/status status
1999/08/12_20:21:29 ERROR: write failure on tty /dev/ttyS1: -1 vs 69: Input/output error
# Same error each 2 seconds
1999/08/12_20:26:17 ERROR: write failure on tty /dev/ttyS1: -1 vs 69: Input/output error
#Stop heartbeat
1999/08/12_20:26:19 SHUTDOWN: in progress.
1999/08/12_20:26:19 SHUTDOWN: complete.
--Boundary-=_nWlrBbmQBhCDarzOwKkYHIDdqSCD
Content-Type: text/english;
name="ha-log_ha0"
Content-Transfer-Encoding: 8bit
Content-Description: log of ha0
Content-Disposition: attachment; filename="ha-log_ha0"
#ha0 node log file.
#ha0 is the backup server.
#
1999/08/12_20:11:53 INIT: /etc/ha.d/bin/init-ha starting.
1999/08/12_20:11:53 Starting serial heartbeat on tty /dev/ttyS0
1999/08/12_20:11:53 UDP heartbeat started on port 1001 interface eth0
1999/08/12_20:11:53 PPP/UDP heartbeat started on port 1001 tty /dev/ttyS0
1999/08/12_20:11:53 Using watchdog device: /dev/watchdog
1999/08/12_20:11:53 INFO: Running /etc/ha.d/rc.d/ip-request ip-request
1999/08/12_20:11:54 PPPd process 936 started
1999/08/12_20:12:08 INIT: /etc/ha.d/bin/init-ha complete.
#Shut down the network (unplug the wire)
1999/08/12_20:15:35 node ha1: is dead
1999/08/12_20:15:35 INFO: Running /etc/ha.d/rc.d/status status
1999/08/12_20:15:35 Sending Gratuitous Arp for 192.168.1.27 on eth0:0
#Connect the network
1999/08/12_20:17:14 ERROR: 54 lost packet(s) for [ha1] [107:162]
1999/08/12_20:17:14 node ha1: status unknown
1999/08/12_20:17:14 INFO: Running /etc/ha.d/rc.d/status status
#ha1 stop's heartbeat
1999/08/12_20:19:53 ERROR: EOF in ttygets [/dev/ttyS0]: Success
1999/08/12_20:19:53 ERROR: EOF in ttygets [/dev/ttyS0]: Success
1999/08/12_20:19:53 ERROR: bad packet in should_copy_ring_pkt
1999/08/12_20:19:54 PPPd process 1014 started
1999/08/12_20:19:54 ERROR: write failure on tty /dev/ttyS0: -1 vs 69: Input/output error
#same error each 2 seconds
1999/08/12_20:20:02 ERROR: write failure on tty /dev/ttyS0: -1 vs 69: Input/output error
1999/08/12_20:20:03 node ha1: is dead
1999/08/12_20:20:03 INFO: Running /etc/ha.d/rc.d/status status
1999/08/12_20:20:04 ERROR: write failure on tty /dev/ttyS0: -1 vs 69: Input/output error
1999/08/12_20:20:06 ERROR: write failure on tty /dev/ttyS0: -1 vs 69: Input/output error
#ha1 start's heartbeat
1999/08/12_20:20:07 node ha1 seq restart 1 vs 241
1999/08/12_20:20:07 node ha1: status unknown
1999/08/12_20:20:07 INFO: Running /etc/ha.d/rc.d/status status
1999/08/12_20:20:08 ERROR: write failure on tty /dev/ttyS0: -1 vs 69: Input/output error
# same error each 2 seconds
1999/08/12_20:20:20 ERROR: write failure on tty /dev/ttyS0: -1 vs 69: Input/output error
1999/08/12_20:20:21 INFO: Running /etc/ha.d/rc.d/ip-request ip-request
1999/08/12_20:20:21 IP Address 192.168.1.27 released
1999/08/12_20:20:22 ERROR: write failure on tty /dev/ttyS0: -1 vs 102: Input/output error
#same error each 2 seconds
1999/08/12_20:20:56 ERROR: write failure on tty /dev/ttyS0: -1 vs 70: Input/output error
#ha0 stops heartbeat
1999/08/12_20:20:57 SHUTDOWN: in progress.
1999/08/12_20:20:57 SHUTDOWN: complete.
#ha0 starts heartbeat
1999/08/12_20:21:13 INIT: /etc/ha.d/bin/init-ha starting.
1999/08/12_20:21:14 Starting serial heartbeat on tty /dev/ttyS0
1999/08/12_20:21:14 UDP heartbeat started on port 1001 interface eth0
1999/08/12_20:21:14 PPP/UDP heartbeat started on port 1001 tty /dev/ttyS0
1999/08/12_20:21:14 Using watchdog device: /dev/watchdog
1999/08/12_20:21:15 PPPd process 1124 started
1999/08/12_20:21:29 INIT: /etc/ha.d/bin/init-ha complete.
1999/08/12_20:26:18 SHUTDOWN: in progress.
1999/08/12_20:26:18 SHUTDOWN: complete.
--Boundary-=_nWlrBbmQBhCDarzOwKkYHIDdqSCD--
More information about the Linux-HA
mailing list