multiple failovers with a Stonith device?
Aaron Bush
abush@microcenter.com
Mon, 28 Oct 2002 11:57:07 -0500
This is a multi-part message in MIME format.
--------------010309010508080801010701
Content-Type: text/plain; charset=us-ascii; format=flowed
Content-Transfer-Encoding: 7bit
I had a problem early this morning (around 4:00am) with heartbeat
failing over multiple times on a two node cluster. Heartbeat is set to
"nice_failback on". The os on both nodes is Linux 2.4.5.18-5 (RedHat
7.3). The hostnames are cctcpa and cctcpb. cctcpa is the primary and
is powered by a WTI RPS10 Power Switch. The RPS10 is atteched to the
serial port of cctcpb. I am using both serial and ethernet (cross over
cable) heartbeat channels. I am running heartbeat-0.4.9.2 on both nodes.
It appears from the log files that the cctcpa (active) node became
bogged down and logged that it and the other node, cctcpb were both
dead. In the archives i came across a post that mentioned that if the
system was too heavily loaded then the following entry might be logged
if the local box failed to heartbeat itself after deadtime seconds:
"ERROR: No local heartbeat. Forcing shutdown"
Is this true?
It also appears that the RPS10 failed to take down the primary server on
the first attempt. This entry was logged in /var/log/messages (why is
this not in the ha-log or ha-debug log files?):
Oct 28 04:01:13 cctcpb heartbeat: Host cctcpa.microcenter.com being
rebooted.
Oct 28 04:01:13 cctcpb heartbeat: Did not find string: 'Plug' fromWTI
RPS10 Power Switch.
Oct 28 04:01:29 cctcpb heartbeat: Did not find string: 'RPS-10 Ready'
fromWTI RPS10 Power Switch.
The logs show two more calls to the RPS10 to reset the power. These two
attempts were successful and can be confirmed via syslog boot entries.
I had a deatime set to 10; which from reading the archives is probably
way too short. The system is not in production yet so the load is
normally very low. The only activity on the system around 4:00am is log
file rotation (standard RedHat cron entries). I have adjusted deadtime
to be 90 now.
I have attached the ha-log entries from both cctcpa and cctcpb. If
anyone on the list can take a look to confirm that the behavior of the
two nodes is normal and expected that would be great. I have also
attached ha.cf (same on boith nodes). haresources on both nodes has the
following lines:
# use cctcpa as primary, use 10.10.1.11 as shared IP
cctcpa.microcenter.com 10.10.1.11 mpsd logger_alert
Something else that i noticed today:
Why all the defunct ifstat and status processes? The 05:17 entries are
from when the system restarted (via Stonith). The 11:44 entries are
from a heartbeart reload on the cctcpb (spare node).
-- ps from cctcpa (active node) after reload on cctcpb (spare node) --
root 1311 1 0 05:17 ttyS0 00:00:00 heartbeat
root 1312 1311 0 05:17 ttyS0 00:00:00 heartbeat
root 1313 1311 0 05:17 ttyS0 00:00:00 heartbeat
root 1314 1311 0 05:17 ttyS0 00:00:00 heartbeat
root 1315 1311 0 05:17 ttyS0 00:00:00 heartbeat
root 1316 1311 0 05:17 ttyS0 00:00:00 heartbeat
root 1317 1316 0 05:17 ttyS0 00:00:00 [ifstat <defunct>]
root 1320 1316 0 05:17 ttyS0 00:00:00 [ifstat <defunct>]
root 1321 1316 0 05:17 ttyS0 00:00:00 [status <defunct>]
root 1326 1316 0 05:17 ttyS0 00:00:00 [status <defunct>]
root 1329 1316 0 05:17 ttyS0 00:00:00 [ifstat <defunct>]
root 1332 1316 0 05:17 ttyS0 00:00:00 [heartbeat <defunct>]
root 1370 1316 0 05:17 ttyS0 00:00:00 [ip-request <defunct>]
root 3052 1316 0 11:44 ttyS0 00:00:00 [status <defunct>]
root 3053 1316 0 11:44 ttyS0 00:00:00 [ifstat <defunct>]
root 3054 1316 0 11:44 ttyS0 00:00:00 [ifstat <defunct>]
root 3074 1316 0 11:44 ttyS0 00:00:00 [ifstat <defunct>]
root 3075 1316 0 11:44 ttyS0 00:00:00 [status <defunct>]
root 3080 1316 0 11:44 ttyS0 00:00:00 [status <defunct>]
root 3083 1316 0 11:44 ttyS0 00:00:00 [ifstat <defunct>]
Thanks,
-ab
--------------010309010508080801010701
Content-Type: text/plain;
name="ha-log.cctpca"
Content-Transfer-Encoding: 7bit
Content-Disposition: inline;
filename="ha-log.cctpca"
heartbeat: 2002/10/27_13:08:13 info: MSG stats: 100/86404 age 1 [pid1072/CONTROL]
heartbeat: 2002/10/27_13:08:13 info: ha_malloc stats: 2316/2073722 95296/52222 [pid1072/CONTROL]
heartbeat: 2002/10/27_13:08:13 info: RealMalloc stats: 96256 total malloc bytes. pid [1072/CONTROL]
heartbeat: 2002/10/27_13:08:13 info: MSG stats: 0/86418 age 1 [pid25058/HBWRITE]
heartbeat: 2002/10/27_13:08:13 info: ha_malloc stats: 0/2074042 0/0 [pid25058/HBWRITE]
heartbeat: 2002/10/27_13:08:13 info: RealMalloc stats: 960 total malloc bytes. pid [25058/HBWRITE]
heartbeat: 2002/10/27_13:08:13 info: MSG stats: 1/86372 age 1 [pid25059/HBREAD]
heartbeat: 2002/10/27_13:08:13 info: ha_malloc stats: 5/2072929 544/348 [pid25059/HBREAD]
heartbeat: 2002/10/27_13:08:13 info: RealMalloc stats: 1808 total malloc bytes. pid [25059/HBREAD]
heartbeat: 2002/10/27_13:08:13 info: MSG stats: 0/86418 age 1 [pid25060/HBWRITE]
heartbeat: 2002/10/27_13:08:13 info: ha_malloc stats: 0/2074042 0/0 [pid25060/HBWRITE]
heartbeat: 2002/10/27_13:08:13 info: RealMalloc stats: 960 total malloc bytes. pid [25060/HBWRITE]
heartbeat: 2002/10/27_13:08:13 info: MSG stats: 0/172751 age 1 [pid25061/HBREAD]
heartbeat: 2002/10/27_13:08:13 info: ha_malloc stats: 0/4146034 0/0 [pid25061/HBREAD]
heartbeat: 2002/10/27_13:08:13 info: RealMalloc stats: 960 total malloc bytes. pid [25061/HBREAD]
heartbeat: 2002/10/27_13:08:13 info: MSG stats: 0/431935 age 1 [pid25062/MST_STATUS]
heartbeat: 2002/10/27_13:08:13 info: ha_malloc stats: 0/8811246 0/0 [pid25062/MST_STATUS]
heartbeat: 2002/10/27_13:08:13 info: RealMalloc stats: 2064 total malloc bytes. pid [25062/MST_STATUS]
heartbeat: 2002/10/28_04:09:12 WARN: node cctcpb.microcenter.com: is dead
heartbeat: 2002/10/28_04:09:12 info: Resource acquisition completed. (none)
heartbeat: 2002/10/28_04:09:12 WARN: node cctcpa.microcenter.com: is dead
heartbeat: 2002/10/28_04:09:12 ERROR: No local heartbeat. Forcing shutdown.
heartbeat: 2002/10/28_04:09:12 info: Link cctcpb.microcenter.com:/dev/ttyS0 dead.
heartbeat: 2002/10/28_04:09:12 info: Link cctcpb.microcenter.com:eth1 dead.
heartbeat: 2002/10/28_04:09:12 info: Heartbeat shutdown in progress. (25062)
heartbeat: 2002/10/28_04:09:12 WARN: Cluster node cctcpb.microcenter.com returning after partition
heartbeat: 2002/10/28_04:09:12 info: Link cctcpb.microcenter.com:eth1 up.
heartbeat: 2002/10/28_04:09:12 ERROR: No one owns our local resources!
heartbeat: 2002/10/28_04:09:12 ERROR: No one owns foreign resources!
heartbeat: 2002/10/28_04:09:12 WARN: Late heartbeat: Node cctcpb.microcenter.com: interval 18350 ms
heartbeat: 2002/10/28_04:09:12 info: Node cctcpb.microcenter.com: status active
heartbeat: 2002/10/28_04:09:12 info: Giving up all HA resources.
heartbeat: 2002/10/28_04:09:12 info: Link cctcpb.microcenter.com:/dev/ttyS0 up.
heartbeat: 2002/10/28_04:09:12 ERROR: No one owns our local resources!
heartbeat: 2002/10/28_04:09:12 ERROR: No one owns foreign resources!
heartbeat: 2002/10/28_04:09:12 ERROR: No one owns our local resources!
heartbeat: 2002/10/28_04:09:12 ERROR: No one owns foreign resources!
heartbeat: 2002/10/28_04:09:12 ERROR: No one owns our local resources!
heartbeat: 2002/10/28_04:09:12 ERROR: No one owns foreign resources!
heartbeat: 2002/10/28_04:09:12 WARN: Cluster node cctcpa.microcenter.com returning after partition
heartbeat: 2002/10/28_04:09:12 WARN: Late heartbeat: Node cctcpa.microcenter.com: interval 20110 ms
heartbeat: 2002/10/28_04:09:12 info: Node cctcpa.microcenter.com: status active
heartbeat: 2002/10/28_04:09:12 info: remote resource transition completed.
heartbeat: 2002/10/28_04:09:12 ERROR: No one owns our local resources!
heartbeat: 2002/10/28_04:09:12 info: Running /etc/ha.d/rc.d/ifstat ifstat
heartbeat: 2002/10/28_04:09:12 info: Running /etc/ha.d/rc.d/ifstat ifstat
heartbeat: 2002/10/28_04:09:12 info: Running /etc/ha.d/rc.d/ifstat ifstat
heartbeat: 2002/10/28_04:09:13 info: Running /etc/ha.d/rc.d/status status
heartbeat: 2002/10/28_04:09:13 info: Running /etc/ha.d/rc.d/ifstat ifstat
heartbeat: 2002/10/28_04:09:13 info: Releasing resource group: cctcpa.microcenter.com 10.10.1.11 mpsd logger_alert
heartbeat: 2002/10/28_04:09:13 info: Running /etc/ha.d/resource.d/logger_alert stop
heartbeat: 2002/10/28_04:09:12 info: Running /etc/ha.d/rc.d/status status
heartbeat: 2002/10/28_04:09:12 info: Running /etc/ha.d/rc.d/status status
heartbeat: 2002/10/28_04:09:13 info: Taking over resource group 10.10.1.11
heartbeat: 2002/10/28_04:09:13 info: Running /etc/ha.d/rc.d/status status
heartbeat: 2002/10/28_04:09:13 info: Running /etc/ha.d/rc.d/shutdone shutdone
heartbeat: 2002/10/28_04:09:13 info: Running /etc/ha.d/resource.d/mpsd stop
heartbeat: 2002/10/28_05:12:29 info: **************************
heartbeat: 2002/10/28_05:12:29 info: Configuration validated. Starting heartbeat 0.4.9.2
heartbeat: 2002/10/28_05:12:29 info: nice_failback is in effect.
heartbeat: 2002/10/28_05:12:29 info: heartbeat: version 0.4.9.2
heartbeat: 2002/10/28_05:12:29 info: Heartbeat generation: 50
heartbeat: 2002/10/28_05:12:29 info: Creating FIFO /var/run/heartbeat-fifo.
heartbeat: 2002/10/28_05:12:29 notice: Starting serial heartbeat on tty /dev/ttyS0
heartbeat: 2002/10/28_05:12:29 notice: UDP heartbeat started on port 694 interface eth1
heartbeat: 2002/10/28_05:12:29 info: Local status now set to: 'up'
heartbeat: 2002/10/28_05:12:29 info: Heartbeat restart on node cctcpa.microcenter.com
heartbeat: 2002/10/28_05:12:29 info: Link cctcpa.microcenter.com:eth1 up.
heartbeat: 2002/10/28_05:12:29 info: Local status now set to: 'active'
heartbeat: 2002/10/28_05:12:29 info: Heartbeat restart on node cctcpb.microcenter.com
heartbeat: 2002/10/28_05:12:29 info: Link cctcpb.microcenter.com:/dev/ttyS0 up.
heartbeat: 2002/10/28_05:12:29 info: Node cctcpb.microcenter.com: status up
heartbeat: 2002/10/28_05:12:29 info: 27 lost packet(s) for [cctcpb.microcenter.com] [42:70]
heartbeat: 2002/10/28_05:12:29 info: Link cctcpb.microcenter.com:eth1 up.
heartbeat: 2002/10/28_05:12:29 info: Node cctcpb.microcenter.com: status active
heartbeat: 2002/10/28_05:12:29 info: Running /etc/ha.d/rc.d/ifstat ifstat
heartbeat: 2002/10/28_05:12:29 info: Running /etc/ha.d/rc.d/ifstat ifstat
heartbeat: 2002/10/28_05:12:29 info: Running /etc/ha.d/rc.d/status status
heartbeat: 2002/10/28_05:12:29 info: Running /etc/ha.d/rc.d/ifstat ifstat
heartbeat: 2002/10/28_05:12:29 info: Running /etc/ha.d/rc.d/status status
heartbeat: 2002/10/28_05:12:30 info: No pkts missing from cctcpb.microcenter.com!
heartbeat: 2002/10/28_05:12:32 ERROR: ha_msg_nadd: cannot add field to ha_msg
heartbeat: 2002/10/28_05:15:17 info: **************************
heartbeat: 2002/10/28_05:15:17 info: Configuration validated. Starting heartbeat 0.4.9.2
heartbeat: 2002/10/28_05:15:17 info: nice_failback is in effect.
heartbeat: 2002/10/28_05:15:17 info: heartbeat: version 0.4.9.2
heartbeat: 2002/10/28_05:15:17 info: Heartbeat generation: 51
heartbeat: 2002/10/28_05:15:17 info: Creating FIFO /var/run/heartbeat-fifo.
heartbeat: 2002/10/28_05:15:17 notice: Starting serial heartbeat on tty /dev/ttyS0
heartbeat: 2002/10/28_05:15:17 notice: UDP heartbeat started on port 694 interface eth1
heartbeat: 2002/10/28_05:15:17 info: Local status now set to: 'up'
heartbeat: 2002/10/28_05:15:17 info: Heartbeat restart on node cctcpa.microcenter.com
heartbeat: 2002/10/28_05:15:18 info: Link cctcpa.microcenter.com:eth1 up.
heartbeat: 2002/10/28_05:15:18 info: Local status now set to: 'active'
heartbeat: 2002/10/28_05:15:18 info: Heartbeat restart on node cctcpb.microcenter.com
heartbeat: 2002/10/28_05:15:18 info: Link cctcpb.microcenter.com:/dev/ttyS0 up.
heartbeat: 2002/10/28_05:15:18 info: Node cctcpb.microcenter.com: status active
heartbeat: 2002/10/28_05:15:18 info: 22 lost packet(s) for [cctcpb.microcenter.com] [136:159]
heartbeat: 2002/10/28_05:15:18 info: Link cctcpb.microcenter.com:eth1 up.
heartbeat: 2002/10/28_05:15:18 info: remote resource transition completed.
heartbeat: 2002/10/28_05:15:18 info: local resource transition completed.
heartbeat: 2002/10/28_05:15:18 info: Resource acquisition completed. (none)
heartbeat: 2002/10/28_05:15:18 info: Running /etc/ha.d/rc.d/ifstat ifstat
heartbeat: 2002/10/28_05:15:18 info: Running /etc/ha.d/rc.d/status status
heartbeat: 2002/10/28_05:15:18 info: Running /etc/ha.d/rc.d/ifstat ifstat
heartbeat: 2002/10/28_05:15:18 info: Running /etc/ha.d/rc.d/ifstat ifstat
heartbeat: 2002/10/28_05:15:18 info: No pkts missing from cctcpb.microcenter.com!
heartbeat: 2002/10/28_05:17:05 WARN: node cctcpb.microcenter.com: is dead
heartbeat: 2002/10/28_05:17:05 info: Link cctcpb.microcenter.com:/dev/ttyS0 dead.
heartbeat: 2002/10/28_05:17:05 info: Link cctcpb.microcenter.com:eth1 dead.
heartbeat: 2002/10/28_05:17:05 info: Running /etc/ha.d/rc.d/ifstat ifstat
heartbeat: 2002/10/28_05:17:05 info: Running /etc/ha.d/rc.d/status status
heartbeat: 2002/10/28_05:17:05 info: Running /etc/ha.d/rc.d/ifstat ifstat
heartbeat: 2002/10/28_05:17:06 info: /usr/lib/heartbeat/mach_down: nice_failback: acquiring foreign resources
heartbeat: 2002/10/28_05:17:06 info: Running /etc/ha.d/resource.d/IPaddr 10.10.1.11 status
heartbeat: 2002/10/28_05:17:06 info: mach_down takeover complete.
heartbeat: 2002/10/28_05:17:06 info: mach_down takeover complete.
heartbeat: 2002/10/28_05:17:06 info: Resource acquisition completed.
heartbeat: 2002/10/28_05:17:06 info: Running /etc/ha.d/rc.d/ip-request ip-request
heartbeat: 2002/10/28_05:17:06 WARN: Cluster node cctcpb.microcenter.com returning after partition
heartbeat: 2002/10/28_05:17:06 info: Heartbeat shutdown in progress. (1105)
heartbeat: 2002/10/28_05:17:06 info: Link cctcpb.microcenter.com:eth1 up.
heartbeat: 2002/10/28_05:17:06 WARN: Late heartbeat: Node cctcpb.microcenter.com: interval 10590 ms
heartbeat: 2002/10/28_05:17:06 info: Node cctcpb.microcenter.com: status active
heartbeat: 2002/10/28_05:17:06 info: Giving up all HA resources.
heartbeat: 2002/10/28_05:17:06 info: local resource transition completed.
heartbeat: 2002/10/28_05:17:06 ERROR: No one owns foreign resources!
heartbeat: 2002/10/28_05:17:06 info: remote resource transition completed.
heartbeat: 2002/10/28_05:17:06 ERROR: No one owns foreign resources!
heartbeat: 2002/10/28_05:17:06 ERROR: No one owns foreign resources!
heartbeat: 2002/10/28_05:17:06 info: Running /etc/ha.d/rc.d/ifstat ifstat
heartbeat: 2002/10/28_05:17:06 info: Running /etc/ha.d/rc.d/status status
heartbeat: 2002/10/28_05:17:06 info: All HA resources relinquished.
heartbeat: 2002/10/28_05:17:06 info: Resource shutdown completed. Restart triggered.
heartbeat: 2002/10/28_05:17:06 ERROR: Cannot send SIGTERM to MSP: No such process
heartbeat: 2002/10/28_05:17:06 info: Running /etc/ha.d/resource.d/IPaddr 10.10.1.11 status
heartbeat: 2002/10/28_05:17:06 info: Resource acquisition completed.
heartbeat: 2002/10/28_05:17:07 info: Heartbeat shutdown complete.
heartbeat: 2002/10/28_05:17:07 info: Restarting heartbeat.
heartbeat: 2002/10/28_05:17:08 info: Killing process 1101 with signal 9
heartbeat: 2002/10/28_05:17:08 info: Killing process 1102 with signal 9
heartbeat: 2002/10/28_05:17:08 info: Killing process 1103 with signal 9
heartbeat: 2002/10/28_05:17:08 info: Killing process 1104 with signal 9
heartbeat: 2002/10/28_05:17:08 info: Killing process 1105 with signal 9
heartbeat: 2002/10/28_05:17:09 info: Performing heartbeat restart exec.
heartbeat: 2002/10/28_05:17:09 info: **************************
heartbeat: 2002/10/28_05:17:09 info: Configuration validated. Starting heartbeat 0.4.9.2
heartbeat: 2002/10/28_05:17:09 info: nice_failback is in effect.
heartbeat: 2002/10/28_05:17:09 info: heartbeat: version 0.4.9.2
heartbeat: 2002/10/28_05:17:09 info: Heartbeat generation: 52
heartbeat: 2002/10/28_05:17:09 notice: Starting serial heartbeat on tty /dev/ttyS0
heartbeat: 2002/10/28_05:17:09 notice: UDP heartbeat started on port 694 interface eth1
heartbeat: 2002/10/28_05:17:09 info: Local status now set to: 'up'
heartbeat: 2002/10/28_05:17:09 info: Heartbeat restart on node cctcpa.microcenter.com
heartbeat: 2002/10/28_05:17:10 info: Link cctcpa.microcenter.com:eth1 up.
heartbeat: 2002/10/28_05:17:10 info: Running /etc/ha.d/rc.d/ifstat ifstat
heartbeat: 2002/10/28_05:17:19 info: Local status now set to: 'active'
heartbeat: 2002/10/28_05:17:19 info: Heartbeat restart on node cctcpb.microcenter.com
heartbeat: 2002/10/28_05:17:19 info: Link cctcpb.microcenter.com:eth1 up.
heartbeat: 2002/10/28_05:17:19 WARN: Late heartbeat: Node cctcpb.microcenter.com: interval 10280 ms
heartbeat: 2002/10/28_05:17:19 info: Node cctcpb.microcenter.com: status up
heartbeat: 2002/10/28_05:17:19 info: Running /etc/ha.d/rc.d/ifstat ifstat
heartbeat: 2002/10/28_05:17:19 info: Node cctcpb.microcenter.com: status active
heartbeat: 2002/10/28_05:17:19 info: Running /etc/ha.d/rc.d/status status
heartbeat: 2002/10/28_05:17:19 info: Running /etc/ha.d/rc.d/status status
heartbeat: 2002/10/28_05:17:20 info: Link cctcpb.microcenter.com:/dev/ttyS0 up.
heartbeat: 2002/10/28_05:17:20 info: Running /etc/ha.d/rc.d/ifstat ifstat
heartbeat: 2002/10/28_05:17:30 info: local resource transition completed.
heartbeat: 2002/10/28_05:17:30 info: remote resource transition completed.
heartbeat: 2002/10/28_05:17:30 info: Running /etc/ha.d/resource.d/IPaddr 10.10.1.11 status
heartbeat: 2002/10/28_05:17:30 info: Resource acquisition completed.
heartbeat: 2002/10/28_05:17:30 info: Running /etc/ha.d/rc.d/ip-request ip-request
heartbeat: 2002/10/28_05:17:40 info: Running /etc/ha.d/resource.d/IPaddr 10.10.1.11 status
heartbeat: 2002/10/28_05:17:40 info: Acquiring resource group: cctcpa.microcenter.com 10.10.1.11 mpsd logger_alert
heartbeat: 2002/10/28_05:17:40 info: Running /etc/ha.d/resource.d/IPaddr 10.10.1.11 start
heartbeat: 2002/10/28_05:17:40 info: ifconfig eth0:0 10.10.1.11 netmask 255.255.255.0 broadcast 10.10.1.255
heartbeat: 2002/10/28_05:17:40 info: Sending Gratuitous Arp for 10.10.1.11 on eth0:0 [eth0]
heartbeat: 2002/10/28_05:17:40 info: Running /etc/ha.d/resource.d/mpsd start
heartbeat: 2002/10/28_05:17:41 info: Running /etc/ha.d/resource.d/logger_alert start
--------------010309010508080801010701
Content-Type: text/plain;
name="ha-log.cctpcb"
Content-Transfer-Encoding: 7bit
Content-Disposition: inline;
filename="ha-log.cctpcb"
heartbeat: 2002/10/27_13:01:54 info: MSG stats: 100/86404 age 2 [pid21751/CONTROL]
heartbeat: 2002/10/27_13:01:54 info: ha_malloc stats: 2300/2073700 91200/49700 [pid21751/CONTROL]
heartbeat: 2002/10/27_13:01:54 info: RealMalloc stats: 92096 total malloc bytes. pid [21751/CONTROL]
heartbeat: 2002/10/27_13:01:54 info: MSG stats: 0/86404 age 2 [pid21752/HBWRITE]
heartbeat: 2002/10/27_13:01:54 info: ha_malloc stats: 0/2073696 0/0 [pid21752/HBWRITE]
heartbeat: 2002/10/27_13:01:54 info: RealMalloc stats: 960 total malloc bytes. pid [21752/HBWRITE]
heartbeat: 2002/10/27_13:01:54 info: MSG stats: 1/86430 age 2 [pid21753/HBREAD]
heartbeat: 2002/10/27_13:01:54 info: ha_malloc stats: 5/2074304 544/348 [pid21753/HBREAD]
heartbeat: 2002/10/27_13:01:54 info: RealMalloc stats: 960 total malloc bytes. pid [21753/HBREAD]
heartbeat: 2002/10/27_13:01:54 info: MSG stats: 0/86404 age 2 [pid21754/HBWRITE]
heartbeat: 2002/10/27_13:01:54 info: ha_malloc stats: 0/2073696 0/0 [pid21754/HBWRITE]
heartbeat: 2002/10/27_13:01:54 info: RealMalloc stats: 960 total malloc bytes. pid [21754/HBWRITE]
heartbeat: 2002/10/27_13:01:54 info: MSG stats: 0/172796 age 2 [pid21755/HBREAD]
heartbeat: 2002/10/27_13:01:54 info: ha_malloc stats: 0/4147110 0/0 [pid21755/HBREAD]
heartbeat: 2002/10/27_13:01:54 info: RealMalloc stats: 960 total malloc bytes. pid [21755/HBREAD]
heartbeat: 2002/10/27_13:01:54 info: MSG stats: 0/432040 age 0 [pid21756/MST_STATUS]
heartbeat: 2002/10/27_13:01:54 info: ha_malloc stats: 0/8813613 0/0 [pid21756/MST_STATUS]
heartbeat: 2002/10/27_13:01:54 info: RealMalloc stats: 1552 total malloc bytes. pid [21756/MST_STATUS]
heartbeat: 2002/10/28_04:01:03 WARN: node cctcpa.microcenter.com: is dead
heartbeat: 2002/10/28_04:01:03 info: Resetting node cctcpa.microcenter.com with [WTI RPS10 Power Switch]
heartbeat: 2002/10/28_04:01:03 info: Link cctcpa.microcenter.com:/dev/ttyS0 dead.
heartbeat: 2002/10/28_04:01:03 info: Link cctcpa.microcenter.com:eth1 dead.
heartbeat: 2002/10/28_04:01:03 info: Running /etc/ha.d/rc.d/ifstat ifstat
heartbeat: 2002/10/28_04:01:03 info: No local resources [/usr/lib/heartbeat/ResourceManager listkeys cctcpb.microcenter.com]
heartbeat: 2002/10/28_04:01:03 info: Resource acquisition completed.
heartbeat: 2002/10/28_04:01:03 info: Running /etc/ha.d/rc.d/status status
heartbeat: 2002/10/28_04:01:03 info: Taking over resource group 10.10.1.11
heartbeat: 2002/10/28_04:01:03 info: Running /etc/ha.d/rc.d/ifstat ifstat
heartbeat: 2002/10/28_04:01:03 info: Acquiring resource group: cctcpa.microcenter.com 10.10.1.11 mpsd logger_alert
heartbeat: 2002/10/28_04:01:03 info: Running /etc/ha.d/resource.d/IPaddr 10.10.1.11 start
heartbeat: 2002/10/28_04:01:03 info: ifconfig eth0:0 10.10.1.11 netmask 255.255.255.0 broadcast 10.10.1.255
heartbeat: 2002/10/28_04:01:03 info: Sending Gratuitous Arp for 10.10.1.11 on eth0:0 [eth0]
heartbeat: 2002/10/28_04:01:03 info: Running /etc/ha.d/resource.d/mpsd start
heartbeat: 2002/10/28_04:01:03 info: Running /etc/ha.d/resource.d/logger_alert start
heartbeat: 2002/10/28_04:01:03 info: /usr/lib/heartbeat/mach_down: nice_failback: acquiring foreign resources
heartbeat: 2002/10/28_04:01:03 info: mach_down takeover complete.
heartbeat: 2002/10/28_04:01:03 info: mach_down takeover complete.
heartbeat: 2002/10/28_04:01:12 WARN: Cluster node cctcpa.microcenter.com returning after partition
heartbeat: 2002/10/28_04:01:12 info: Heartbeat shutdown in progress. (21756)
heartbeat: 2002/10/28_04:01:12 info: Link cctcpa.microcenter.com:eth1 up.
heartbeat: 2002/10/28_04:01:12 WARN: Late heartbeat: Node cctcpa.microcenter.com: interval 20110 ms
heartbeat: 2002/10/28_04:01:12 info: Node cctcpa.microcenter.com: status active
heartbeat: 2002/10/28_04:01:12 info: Giving up all HA resources.
heartbeat: 2002/10/28_04:01:12 info: local resource transition completed.
heartbeat: 2002/10/28_04:01:12 ERROR: No one owns foreign resources!
heartbeat: 2002/10/28_04:01:12 info: Running /etc/ha.d/rc.d/status status
heartbeat: 2002/10/28_04:01:12 info: Releasing resource group: cctcpa.microcenter.com 10.10.1.11 mpsd logger_alert
heartbeat: 2002/10/28_04:01:12 info: Running /etc/ha.d/resource.d/logger_alert stop
heartbeat: 2002/10/28_04:01:12 info: Running /etc/ha.d/rc.d/ifstat ifstat
heartbeat: 2002/10/28_04:01:12 info: No local resources [/usr/lib/heartbeat/ResourceManager listkeys cctcpb.microcenter.com]
heartbeat: 2002/10/28_04:01:12 info: Resource acquisition completed.
heartbeat: 2002/10/28_04:01:12 info: Running /etc/ha.d/resource.d/mpsd stop
heartbeat: 2002/10/28_04:01:12 info: Running /etc/ha.d/resource.d/IPaddr 10.10.1.11 stop
heartbeat: 2002/10/28_04:01:12 info: IP Address 10.10.1.11 released
heartbeat: 2002/10/28_04:01:12 info: All HA resources relinquished.
heartbeat: 2002/10/28_04:01:12 info: Resource shutdown completed. Restart triggered.
heartbeat: 2002/10/28_04:01:12 ERROR: Cannot send SIGTERM to MSP: No such process
heartbeat: 2002/10/28_04:01:13 info: Heartbeat shutdown complete.
heartbeat: 2002/10/28_04:01:13 info: Restarting heartbeat.
heartbeat: 2002/10/28_04:01:13 info: Heartbeat shutdown in progress. (17062)
heartbeat: 2002/10/28_04:01:13 ERROR: Host cctcpa.microcenter.com not reset!
heartbeat: 2002/10/28_04:01:13 info: Giving up all HA resources.
heartbeat: 2002/10/28_04:01:13 info: All HA resources relinquished.
heartbeat: 2002/10/28_04:01:14 info: Killing process 21752 with signal 9
heartbeat: 2002/10/28_04:01:14 info: Killing process 21753 with signal 9
heartbeat: 2002/10/28_04:01:14 info: Killing process 21754 with signal 9
heartbeat: 2002/10/28_04:01:14 info: Killing process 21755 with signal 9
heartbeat: 2002/10/28_04:01:14 info: Killing process 21756 with signal 9
heartbeat: 2002/10/28_04:01:15 info: Performing heartbeat restart exec.
heartbeat: 2002/10/28_04:01:29 info: **************************
heartbeat: 2002/10/28_04:01:29 info: Configuration validated. Starting heartbeat 0.4.9.2
heartbeat: 2002/10/28_04:01:29 info: nice_failback is in effect.
heartbeat: 2002/10/28_04:01:29 info: heartbeat: version 0.4.9.2
heartbeat: 2002/10/28_04:01:30 info: Heartbeat generation: 47
heartbeat: 2002/10/28_04:01:30 notice: Starting serial heartbeat on tty /dev/ttyS0
heartbeat: 2002/10/28_04:01:30 notice: UDP heartbeat started on port 694 interface eth1
heartbeat: 2002/10/28_04:01:30 info: Local status now set to: 'up'
heartbeat: 2002/10/28_04:01:30 info: Heartbeat restart on node cctcpb.microcenter.com
heartbeat: 2002/10/28_04:01:31 info: Link cctcpb.microcenter.com:eth1 up.
heartbeat: 2002/10/28_04:01:31 info: Running /etc/ha.d/rc.d/ifstat ifstat
heartbeat: 2002/10/28_04:02:25 WARN: TTY write timeout on [/dev/ttyS0] (no connection?)
heartbeat: 2002/10/28_04:03:31 WARN: node cctcpa.microcenter.com: is dead
heartbeat: 2002/10/28_04:03:31 info: Resetting node cctcpa.microcenter.com with [WTI RPS10 Power Switch]
heartbeat: 2002/10/28_04:03:31 info: Local status now set to: 'active'
heartbeat: 2002/10/28_04:03:31 info: Node cctcpb.microcenter.com: status up
heartbeat: 2002/10/28_04:03:31 info: Node cctcpb.microcenter.com: status active
heartbeat: 2002/10/28_04:03:31 info: No local resources [/usr/lib/heartbeat/ResourceManager listkeys cctcpb.microcenter.com]
heartbeat: 2002/10/28_04:03:31 info: Resource acquisition completed.
heartbeat: 2002/10/28_04:03:31 info: Running /etc/ha.d/rc.d/status status
heartbeat: 2002/10/28_04:03:31 info: Taking over resource group 10.10.1.11
heartbeat: 2002/10/28_04:03:31 info: Acquiring resource group: cctcpa.microcenter.com 10.10.1.11 mpsd logger_alert
heartbeat: 2002/10/28_04:03:31 info: Running /etc/ha.d/rc.d/status status
heartbeat: 2002/10/28_04:03:31 info: Running /etc/ha.d/rc.d/status status
heartbeat: 2002/10/28_04:03:31 info: Running /etc/ha.d/resource.d/IPaddr 10.10.1.11 start
heartbeat: 2002/10/28_04:03:31 info: ifconfig eth0:0 10.10.1.11 netmask 255.255.255.0 broadcast 10.10.1.255
heartbeat: 2002/10/28_04:03:31 info: Sending Gratuitous Arp for 10.10.1.11 on eth0:0 [eth0]
heartbeat: 2002/10/28_04:03:31 info: Running /etc/ha.d/resource.d/mpsd start
heartbeat: 2002/10/28_04:03:31 info: Running /etc/ha.d/resource.d/logger_alert start
heartbeat: 2002/10/28_04:03:31 info: /usr/lib/heartbeat/mach_down: nice_failback: acquiring foreign resources
heartbeat: 2002/10/28_04:03:31 info: mach_down takeover complete.
heartbeat: 2002/10/28_04:03:31 info: mach_down takeover complete.
heartbeat: 2002/10/28_04:03:35 info: Heartbeat restart on node cctcpa.microcenter.com
heartbeat: 2002/10/28_04:03:35 info: Link cctcpa.microcenter.com:eth1 up.
heartbeat: 2002/10/28_04:03:35 WARN: Late heartbeat: Node cctcpa.microcenter.com: interval 125140 ms
heartbeat: 2002/10/28_04:03:35 info: Node cctcpa.microcenter.com: status up
heartbeat: 2002/10/28_04:03:35 info: Running /etc/ha.d/rc.d/status status
heartbeat: 2002/10/28_04:03:35 info: Running /etc/ha.d/rc.d/ifstat ifstat
heartbeat: 2002/10/28_04:03:35 info: Node cctcpa.microcenter.com: status active
heartbeat: 2002/10/28_04:03:35 info: Running /etc/ha.d/rc.d/status status
heartbeat: 2002/10/28_04:03:35 info: Link cctcpa.microcenter.com:/dev/ttyS0 up.
heartbeat: 2002/10/28_04:03:35 info: Running /etc/ha.d/rc.d/ifstat ifstat
heartbeat: 2002/10/28_04:03:43 info: Resource acquisition completed. (none)
heartbeat: 2002/10/28_04:03:43 info: local resource transition completed.
heartbeat: 2002/10/28_04:03:46 info: node cctcpa.microcenter.com now reset.
heartbeat: 2002/10/28_04:03:46 ERROR: /etc/ha.d/harc: /etc/ha.d/rc.d/stonith: cannot execute
heartbeat: 2002/10/28_04:03:51 WARN: node cctcpa.microcenter.com: is dead
heartbeat: 2002/10/28_04:03:51 info: Resource acquisition completed. (none)
heartbeat: 2002/10/28_04:03:51 info: Link cctcpa.microcenter.com:/dev/ttyS0 dead.
heartbeat: 2002/10/28_04:03:51 info: Link cctcpa.microcenter.com:eth1 dead.
heartbeat: 2002/10/28_04:03:51 info: Resetting node cctcpa.microcenter.com with [WTI RPS10 Power Switch]
heartbeat: 2002/10/28_04:03:51 info: Running /etc/ha.d/rc.d/ifstat ifstat
heartbeat: 2002/10/28_04:03:51 info: Running /etc/ha.d/rc.d/status status
heartbeat: 2002/10/28_04:03:51 info: Taking over resource group 10.10.1.11
heartbeat: 2002/10/28_04:03:51 info: /usr/lib/heartbeat/mach_down: nice_failback: acquiring foreign resources
heartbeat: 2002/10/28_04:03:51 info: mach_down takeover complete.
heartbeat: 2002/10/28_04:03:51 info: mach_down takeover complete.
heartbeat: 2002/10/28_04:03:51 info: Running /etc/ha.d/rc.d/ifstat ifstat
heartbeat: 2002/10/28_04:04:06 info: node cctcpa.microcenter.com now reset.
heartbeat: 2002/10/28_04:04:06 ERROR: /etc/ha.d/harc: /etc/ha.d/rc.d/stonith: cannot execute
heartbeat: 2002/10/28_04:06:23 info: Heartbeat restart on node cctcpa.microcenter.com
heartbeat: 2002/10/28_04:06:23 info: Link cctcpa.microcenter.com:eth1 up.
heartbeat: 2002/10/28_04:06:23 WARN: Late heartbeat: Node cctcpa.microcenter.com: interval 163140 ms
heartbeat: 2002/10/28_04:06:23 info: Node cctcpa.microcenter.com: status up
heartbeat: 2002/10/28_04:06:23 info: Running /etc/ha.d/rc.d/ifstat ifstat
heartbeat: 2002/10/28_04:06:23 info: Running /etc/ha.d/rc.d/status status
heartbeat: 2002/10/28_04:06:23 info: Node cctcpa.microcenter.com: status active
heartbeat: 2002/10/28_04:06:23 info: Running /etc/ha.d/rc.d/status status
heartbeat: 2002/10/28_04:06:23 info: remote resource transition completed.
heartbeat: 2002/10/28_04:06:23 info: Link cctcpa.microcenter.com:/dev/ttyS0 up.
heartbeat: 2002/10/28_04:06:23 info: Running /etc/ha.d/rc.d/ifstat ifstat
heartbeat: 2002/10/28_04:08:11 WARN: node cctcpb.microcenter.com: is dead
heartbeat: 2002/10/28_04:08:11 ERROR: No local heartbeat. Forcing shutdown.
heartbeat: 2002/10/28_04:08:11 WARN: Late heartbeat: Node cctcpa.microcenter.com: interval 9320 ms
heartbeat: 2002/10/28_04:08:11 info: Heartbeat shutdown in progress. (17372)
heartbeat: 2002/10/28_04:08:11 info: Giving up all HA resources.
heartbeat: 2002/10/28_04:08:11 info: local resource transition completed.
heartbeat: 2002/10/28_04:08:11 info: Resource acquisition completed. (none)
heartbeat: 2002/10/28_04:08:11 ERROR: No one owns foreign resources!
heartbeat: 2002/10/28_04:08:11 ERROR: No one owns foreign resources!
heartbeat: 2002/10/28_04:08:11 WARN: Cluster node cctcpb.microcenter.com returning after partition
heartbeat: 2002/10/28_04:08:11 WARN: Late heartbeat: Node cctcpb.microcenter.com: interval 10580 ms
heartbeat: 2002/10/28_04:08:11 info: Node cctcpb.microcenter.com: status active
heartbeat: 2002/10/28_04:08:11 info: remote resource transition completed.
heartbeat: 2002/10/28_04:08:11 ERROR: No one owns our local resources!
heartbeat: 2002/10/28_04:08:11 info: Running /etc/ha.d/rc.d/status status
heartbeat: 2002/10/28_04:08:11 info: Releasing resource group: cctcpa.microcenter.com 10.10.1.11 mpsd logger_alert
heartbeat: 2002/10/28_04:08:11 info: Running /etc/ha.d/resource.d/logger_alert stop
heartbeat: 2002/10/28_04:08:11 info: Running /etc/ha.d/rc.d/status status
heartbeat: 2002/10/28_04:08:11 info: /usr/lib/heartbeat/mach_down: nice_failback: acquiring foreign resources
heartbeat: 2002/10/28_04:08:11 info: mach_down takeover complete.
heartbeat: 2002/10/28_04:08:11 ERROR: Both machines own foreign resources!
heartbeat: 2002/10/28_04:08:11 ERROR: No one owns our local resources!
heartbeat: 2002/10/28_04:08:11 ERROR: Both machines own foreign resources!
heartbeat: 2002/10/28_04:08:11 info: mach_down takeover complete.
heartbeat: 2002/10/28_04:08:11 info: Running /etc/ha.d/rc.d/ip-request ip-request
heartbeat: 2002/10/28_04:08:11 info: Running /etc/ha.d/resource.d/IPaddr 10.10.1.11 status
heartbeat: 2002/10/28_04:08:11 info: Releasing resource group: cctcpa.microcenter.com 10.10.1.11 mpsd logger_alert
heartbeat: 2002/10/28_04:08:11 info: Running /etc/ha.d/resource.d/logger_alert stop
heartbeat: 2002/10/28_04:08:12 info: Running /etc/ha.d/resource.d/mpsd stop
heartbeat: 2002/10/28_04:08:11 info: Running /etc/ha.d/rc.d/shutdone shutdone
heartbeat: 2002/10/28_04:08:12 info: Running /etc/ha.d/resource.d/mpsd stop
heartbeat: 2002/10/28_04:08:12 info: Running /etc/ha.d/resource.d/IPaddr 10.10.1.11 stop
heartbeat: 2002/10/28_04:08:12 info: IP Address 10.10.1.11 released
heartbeat: 2002/10/28_04:08:12 info: All HA resources relinquished.
heartbeat: 2002/10/28_04:08:12 info: Resource shutdown completed. Restart triggered.
heartbeat: 2002/10/28_04:08:12 info: Running /etc/ha.d/resource.d/IPaddr 10.10.1.11 stop
heartbeat: 2002/10/28_04:08:13 info: Heartbeat shutdown complete.
heartbeat: 2002/10/28_04:08:13 info: Restarting heartbeat.
heartbeat: 2002/10/28_04:08:14 info: Killing process 17368 with signal 9
heartbeat: 2002/10/28_04:08:14 info: Killing process 17369 with signal 9
heartbeat: 2002/10/28_04:08:14 info: Killing process 17370 with signal 9
heartbeat: 2002/10/28_04:08:14 info: Killing process 17371 with signal 9
heartbeat: 2002/10/28_04:08:14 info: Killing process 17372 with signal 9
heartbeat: 2002/10/28_04:08:15 info: Performing heartbeat restart exec.
heartbeat: 2002/10/28_04:08:25 info: **************************
heartbeat: 2002/10/28_04:08:25 info: Configuration validated. Starting heartbeat 0.4.9.2
heartbeat: 2002/10/28_04:08:25 info: nice_failback is in effect.
heartbeat: 2002/10/28_04:08:25 info: heartbeat: version 0.4.9.2
heartbeat: 2002/10/28_04:08:25 info: Heartbeat generation: 48
heartbeat: 2002/10/28_04:08:25 notice: Starting serial heartbeat on tty /dev/ttyS0
heartbeat: 2002/10/28_04:08:25 notice: UDP heartbeat started on port 694 interface eth1
heartbeat: 2002/10/28_04:08:25 info: Local status now set to: 'up'
heartbeat: 2002/10/28_04:08:25 info: Heartbeat restart on node cctcpb.microcenter.com
heartbeat: 2002/10/28_04:08:25 info: Link cctcpb.microcenter.com:eth1 up.
heartbeat: 2002/10/28_04:08:25 info: Local status now set to: 'active'
heartbeat: 2002/10/28_04:08:25 info: Heartbeat restart on node cctcpa.microcenter.com
heartbeat: 2002/10/28_04:08:25 info: Link cctcpa.microcenter.com:eth1 up.
heartbeat: 2002/10/28_04:08:25 info: Node cctcpa.microcenter.com: status active
heartbeat: 2002/10/28_04:08:25 info: Running /etc/ha.d/rc.d/ifstat ifstat
heartbeat: 2002/10/28_04:08:25 info: Running /etc/ha.d/rc.d/status status
heartbeat: 2002/10/28_04:08:25 info: Running /etc/ha.d/rc.d/ifstat ifstat
heartbeat: 2002/10/28_04:08:25 info: Link cctcpa.microcenter.com:/dev/ttyS0 up.
heartbeat: 2002/10/28_04:08:25 info: Running /etc/ha.d/rc.d/ifstat ifstat
heartbeat: 2002/10/28_04:08:36 info: local resource transition completed.
heartbeat: 2002/10/28_04:08:36 info: remote resource transition completed.
heartbeat: 2002/10/28_04:08:36 info: No local resources [/usr/lib/heartbeat/ResourceManager listkeys cctcpb.microcenter.com]
heartbeat: 2002/10/28_04:08:36 info: Resource acquisition completed.
heartbeat: 2002/10/28_04:08:36 info: Running /etc/ha.d/rc.d/ip-request ip-request
heartbeat: 2002/10/28_04:08:36 info: Running /etc/ha.d/resource.d/IPaddr 10.10.1.11 status
--------------010309010508080801010701
Content-Type: text/plain;
name="ha.cf"
Content-Transfer-Encoding: 7bit
Content-Disposition: inline;
filename="ha.cf"
# File to wirte debug messages to
debugfile /var/log/ha-debug
#
#
# File to write other messages to
#
logfile /var/log/ha-log
#
#
# Facility to use for syslog()/logger
#
#logfacility local0
#
#
# keepalive: how many seconds between heartbeats
#
keepalive 2
#
# deadtime: seconds-to-declare-host-dead
#
deadtime 90
warntime 10
#
#
initdead 180
#
# serial serialportname ...
serial /dev/ttyS0
#
#
baud 19200
#
#
udpport 694
#
#
udp eth1
node cctcpb.microcenter.com
node cctcpa.microcenter.com
# do not failback immediately
nice_failback on
stonith_host cctcpb.microcenter.com rps10 /dev/ttyS1 cctcpa.microcenter.com 0
--------------010309010508080801010701--