[Linux-HA] Death of my servers!
Frank R Callaghan
f.callaghan at ieee.org
Mon Mar 8 13:34:35 MST 2004
Hi All,
I have been running two servers with DRBD
and HB-1.04 for 30 days with no problem :) then
I put them online for a day of real-loadtesting after
an email download of 158 mails with clam enabled they
both died :( here is the relavant part onf the logfile from
the primary server lserver3 :=
-----------------------------------------------------------------------------------------
heartbeat: 2004/03/05_14:55:57 WARN: Late heartbeat: Node lserver2: interval
2680 ms
heartbeat: 2004/03/05_14:55:57 WARN: Late heartbeat: Node lserver3: interval
3030 ms
heartbeat: 2004/03/05_14:56:17 WARN: Late heartbeat: Node lserver2: interval
9940 ms
heartbeat: 2004/03/05_14:56:18 WARN: node lserver3: is dead
heartbeat: 2004/03/05_14:56:18 ERROR: No local heartbeat. Forcing shutdown.
heartbeat: 2004/03/05_14:56:19 info: Link lserver2:usb0 dead.
heartbeat: 2004/03/05_14:56:21 info: hb_signal_giveup_resources(): current
status: active
heartbeat: 2004/03/05_14:56:21 info: Heartbeat shutdown in progress. (2402)
heartbeat: 2004/03/05_14:56:21 WARN: node lserver3: is dead
heartbeat: 2004/03/05_14:56:21 ERROR: No local heartbeat. Forcing shutdown.
heartbeat: 2004/03/05_14:56:21 info: Link lserver2:usb0 up.
heartbeat: 2004/03/05_14:56:21 info: heartbeat: version 1.0.4
heartbeat: 2004/03/05_14:56:21 info: Giving up all HA resources.
heartbeat: 2004/03/05_14:56:22 WARN: Late heartbeat: Node lserver2: interval
4800 ms
heartbeat: 2004/03/05_14:56:22 WARN: node lserver3: is dead
heartbeat: 2004/03/05_14:56:22 ERROR: No local heartbeat. Forcing shutdown.
heartbeat: 2004/03/05_14:56:22 WARN: Late heartbeat: Node lserver3: interval
15110 ms
heartbeat: 2004/03/05_14:56:25 info: MSG: Dumping message with 9 fields
heartbeat: 2004/03/05_14:56:26 info: MSG[0]: [t=status]
heartbeat: 2004/03/05_14:56:26 info: MSG[1]: [st=active]
heartbeat: 2004/03/05_14:56:26 info: MSG[2]: [src=lserver3]
heartbeat: 2004/03/05_14:56:26 info: MSG[3]: [seq=304f7c]
heartbeat: 2004/03/05_14:56:26 info: MSG[4]: [hg=71]
heartbeat: 2004/03/05_14:56:26 info: MSG[5]: [ts=4048db60]
heartbeat: 2004/03/05_14:56:26 info: MSG[6]: [ld=11.54 6.28 3.01 4/166 6847]
heartbeat: 2004/03/05_14:56:26 info: MSG[7]: [ttl=3]
heartbeat: 2004/03/05_14:56:26 info: MSG[8]: [auth=1
16566808a694287928fb455f8af54459e5242431]
heartbeat: 2004/03/05_14:56:26 info: MSG: Dumping message with 9 fields
heartbeat: 2004/03/05_14:56:26 info: MSG[0]: [t=status]
heartbeat: 2004/03/05_14:56:26 info: MSG[1]: [st=active]
heartbeat: 2004/03/05_14:56:26 info: MSG[2]: [src=lserver3]
heartbeat: 2004/03/05_14:56:26 info: MSG[3]: [seq=304f7c]
heartbeat: 2004/03/05_14:56:26 info: MSG[4]: [hg=71]
heartbeat: 2004/03/05_14:56:26 info: MSG[5]: [ts=4048db60]
heartbeat: 2004/03/05_14:56:26 info: MSG[6]: [ld=11.54 6.28 3.01 4/166 6847]
heartbeat: 2004/03/05_14:56:26 info: MSG[7]: [ttl=3]
heartbeat: 2004/03/05_14:56:26 info: MSG[8]: [auth=1
16566808a694287928fb455f8af54459e5242431]
------------------------------------------------------------------------------------------
It seems that the loading caused HB to miss beats and conclude
that I'm dead so take my services - and then missed lserver3's heartbeat and
killed him too !
Is this what suposed to happen under heavy load ?
Am I reading this all wrong?
Help please ;)
Cheers,
Frank.
More information about the Linux-HA
mailing list