[Linux-ha-dev] A bug of linux-ha: Heartbeat will restart when
sendand receive hundreds of messages
Wei, Willna
willna.wei at intel.com
Tue Jun 8 19:52:27 MDT 2004
Skipped content of type multipart/alternative-------------- next part --------------
heartbeat: 2004/06/09_09:36:47 WARN: 1 lost packet(s) for [aelan3] [6673:6675]
heartbeat: 2004/06/09_09:36:47 info: No pkts missing from aelan3!
Jun 9 09:36:47 aelan4 heartbeat[14530]: WARN: 1 lost packet(s) for [aelan3] [6673:6675]
Jun 9 09:36:47 aelan4 heartbeat[14530]: info: No pkts missing from aelan3!
Jun 9 09:36:48 aelan4 heartbeat[14530]: debug: SO_PEERCRED returned [21028, (0:0)]
Jun 9 09:36:48 aelan4 heartbeat[14530]: debug: Verifying authentication: cred.uid=0 cred.gid=0
Jun 9 09:36:48 aelan4 heartbeat[14530]: debug: Verifying authentication: uidptr=0x809cad0 gidptr=0x0
Jun 9 09:36:48 aelan4 heartbeat[14530]: debug: SO_PEERCRED returned [21028, (0:0)]
Jun 9 09:36:48 aelan4 heartbeat[14530]: debug: Verifying authentication: cred.uid=0 cred.gid=0
Jun 9 09:36:48 aelan4 heartbeat[14530]: debug: Verifying authentication: uidptr=0x0 gidptr=0x8097080
Jun 9 09:36:48 aelan4 heartbeat[14530]: debug: SO_PEERCRED returned [21028, (0:0)]
Jun 9 09:36:48 aelan4 heartbeat[14530]: debug: Verifying authentication: cred.uid=0 cred.gid=0
heartbeat: 2004/06/09_09:36:48 WARN: node aelan3: is dead
Jun 9 09:36:48 aelan4 heartbeat[14530]: debug: Verifying authentication: uidptr=0x808e3d8 gidptr=0x808e828
heartbeat: 2004/06/09_09:36:48 WARN: No STONITH device configured.
heartbeat: 2004/06/09_09:36:48 WARN: Shared disks are not protected.
heartbeat: 2004/06/09_09:36:48 info: Resources being acquired from aelan3.
heartbeat: 2004/06/09_09:36:48 info: Link aelan3:eth1 dead.
heartbeat: 2004/06/09_09:36:48 info: Running /etc/ha.d/rc.d/status status
heartbeat: 2004/06/09_09:36:48 info: No local resources [/usr/lib/heartbeat/ResourceManager listkeys aelan4] to acquire.
Jun 9 09:36:48 aelan4 heartbeat[14530]: WARN: node aelan3: is dead
Jun 9 09:36:48 aelan4 heartbeat[14530]: WARN: No STONITH device configured.
heartbeat: 2004/06/09_09:36:48 info: /usr/lib/heartbeat/mach_down: nice_failback: foreign resources acquired
Jun 9 09:36:48 aelan4 heartbeat[14530]: WARN: Shared disks are not protected.
Jun 9 09:36:48 aelan4 heartbeat[14530]: info: Resources being acquired from aelan3.
Jun 9 09:36:48 aelan4 heartbeat[21041]: debug: notify_world: setting SIGCHLD Handler to SIG_DFL
heartbeat: 2004/06/09_09:36:48 info: mach_down takeover complete.
heartbeat: 2004/06/09_09:36:48 info: mach_down takeover complete for node aelan3.
Jun 9 09:36:48 aelan4 heartbeat[14530]: info: Link aelan3:eth1 dead.
Jun 9 09:36:48 aelan4 heartbeat: info: Running /etc/ha.d/rc.d/status status
Jun 9 09:36:48 aelan4 heartbeat[21042]: info: No local resources [/usr/lib/heartbeat/ResourceManager listkeys aelan4] to acquire.
Jun 9 09:36:48 aelan4 heartbeat[14530]: debug: StartNextRemoteRscReq(): child count 1
Jun 9 09:36:48 aelan4 heartbeat: info: /usr/lib/heartbeat/mach_down: nice_failback: foreign resources acquired
Jun 9 09:36:48 aelan4 heartbeat[14530]: info: mach_down takeover complete.
Jun 9 09:36:48 aelan4 heartbeat: info: mach_down takeover complete for node aelan3.
heartbeat: 2004/06/09_09:37:04 info: Link aelan3:eth1 up.
Jun 9 09:37:04 aelan4 heartbeat[14530]: info: Link aelan3:eth1 up.
heartbeat: 2004/06/09_09:37:09 WARN: Cluster node aelan3 returning after partition.
heartbeat: 2004/06/09_09:37:09 WARN: Deadtime value may be too small.
heartbeat: 2004/06/09_09:37:09 info: See documentation for information on tuning deadtime.
heartbeat: 2004/06/09_09:37:09 WARN: Late heartbeat: Node aelan3: interval 22320 ms
heartbeat: 2004/06/09_09:37:09 info: Status update for node aelan3: status active
Jun 9 09:37:09 aelan4 heartbeat[14530]: WARN: Cluster node aelan3 returning after partition.
heartbeat: 2004/06/09_09:37:09 info: Running /etc/ha.d/rc.d/status status
Jun 9 09:37:09 aelan4 heartbeat[14530]: WARN: Deadtime value may be too small.
Jun 9 09:37:09 aelan4 heartbeat[14530]: info: See documentation for information on tuning deadtime.
Jun 9 09:37:09 aelan4 heartbeat[14530]: WARN: Late heartbeat: Node aelan3: interval 22320 ms
Jun 9 09:37:09 aelan4 heartbeat[14530]: info: Status update for node aelan3: status active
Jun 9 09:37:09 aelan4 heartbeat[21072]: debug: notify_world: setting SIGCHLD Handler to SIG_DFL
Jun 9 09:37:09 aelan4 heartbeat: info: Running /etc/ha.d/rc.d/status status
heartbeat: 2004/06/09_09:37:10 ERROR: Cannot rexmit pkt 602592: seqno too low
heartbeat: 2004/06/09_09:37:10 ERROR: Irretrievably lost packet: node aelan4 seq 602592
Jun 9 09:37:10 aelan4 heartbeat[14530]: ERROR: Cannot rexmit pkt 602592: seqno too low
Jun 9 09:37:10 aelan4 heartbeat[14530]: ERROR: Irretrievably lost packet: node aelan4 seq 602592
heartbeat: 2004/06/09_09:37:11 WARN: Shutdown delayed until current resource activity finishes.
Jun 9 09:37:11 aelan4 heartbeat[14530]: WARN: Shutdown delayed until current resource activity finishes.
heartbeat: 2004/06/09_09:37:15 info: Received shutdown notice from 'aelan3'.
heartbeat: 2004/06/09_09:37:15 info: Resources being acquired from aelan3.
Jun 9 09:37:15 aelan4 heartbeat[14530]: info: Received shutdown notice from 'aelan3'.
heartbeat: 2004/06/09_09:37:15 info: Running /etc/ha.d/rc.d/status status
heartbeat: 2004/06/09_09:37:15 info: No local resources [/usr/lib/heartbeat/ResourceManager listkeys aelan4] to acquire.
Jun 9 09:37:15 aelan4 heartbeat[14530]: info: Resources being acquired from aelan3.
heartbeat: 2004/06/09_09:37:15 info: /usr/lib/heartbeat/mach_down: nice_failback: foreign resources acquired
heartbeat: 2004/06/09_09:37:15 info: mach_down takeover complete.
heartbeat: 2004/06/09_09:37:15 info: mach_down takeover complete for node aelan3.
heartbeat: 2004/06/09_09:37:15 info: Heartbeat shutdown in progress. (14530)
Jun 9 09:37:15 aelan4 heartbeat[21076]: debug: notify_world: setting SIGCHLD Handler to SIG_DFL
heartbeat: 2004/06/09_09:37:15 info: Giving up all HA resources.
Jun 9 09:37:15 aelan4 heartbeat: info: Running /etc/ha.d/rc.d/status status
Jun 9 09:37:15 aelan4 heartbeat[21077]: info: No local resources [/usr/lib/heartbeat/ResourceManager listkeys aelan4] to acquire.
Jun 9 09:37:15 aelan4 heartbeat[14530]: debug: StartNextRemoteRscReq(): child count 1
Jun 9 09:37:15 aelan4 heartbeat: info: /usr/lib/heartbeat/mach_down: nice_failback: foreign resources acquired
Jun 9 09:37:15 aelan4 heartbeat[14530]: info: mach_down takeover complete.
Jun 9 09:37:15 aelan4 heartbeat: info: mach_down takeover complete for node aelan3.
Jun 9 09:37:15 aelan4 heartbeat[14530]: info: Heartbeat shutdown in progress. (14530)
Jun 9 09:37:15 aelan4 heartbeat[21107]: info: Giving up all HA resources.
heartbeat: 2004/06/09_09:37:17 WARN: node aelan3: is dead
heartbeat: 2004/06/09_09:37:17 info: Dead node aelan3 held no resources.
heartbeat: 2004/06/09_09:37:17 info: Resource takeover cancelled - shutdown in progress.
heartbeat: 2004/06/09_09:37:17 info: Link aelan3:eth1 dead.
Jun 9 09:37:17 aelan4 heartbeat[14530]: WARN: node aelan3: is dead
Jun 9 09:37:17 aelan4 heartbeat[14530]: info: Dead node aelan3 held no resources.
Jun 9 09:37:17 aelan4 heartbeat[14530]: info: Resource takeover cancelled - shutdown in progress.
Jun 9 09:37:17 aelan4 heartbeat[14530]: info: Link aelan3:eth1 dead.
heartbeat: 2004/06/09_09:37:19 info: Heartbeat restart on node aelan3
heartbeat: 2004/06/09_09:37:19 info: Link aelan3:eth1 up.
heartbeat: 2004/06/09_09:37:19 info: Status update for node aelan3: status up
Jun 9 09:37:19 aelan4 heartbeat[14530]: info: Heartbeat restart on node aelan3
heartbeat: 2004/06/09_09:37:19 info: Status update for node aelan3: status active
Jun 9 09:37:19 aelan4 heartbeat[14530]: info: Link aelan3:eth1 up.
Jun 9 09:37:19 aelan4 heartbeat[14530]: info: Status update for node aelan3: status up
Jun 9 09:37:19 aelan4 heartbeat[14530]: debug: StartNextRemoteRscReq(): child count 1
Jun 9 09:37:19 aelan4 heartbeat[14530]: info: Status update for node aelan3: status active
Jun 9 09:37:19 aelan4 heartbeat[14530]: debug: StartNextRemoteRscReq(): child count 1
heartbeat: 2004/06/09_09:37:21 info: killing heartbeat resource child process group 21076 with signal 9
heartbeat: 2004/06/09_09:37:21 info: All HA resources relinquished.
heartbeat: 2004/06/09_09:37:21 info: killing status process group 22009 with signal 15
Jun 9 09:37:21 aelan4 heartbeat[21107]: info: killing heartbeat resource child process group 21076 with signal 9
heartbeat: 2004/06/09_09:37:21 info: Running /etc/ha.d/rc.d/status status
Jun 9 09:37:21 aelan4 heartbeat[21107]: info: All HA resources relinquished.
Jun 9 09:37:21 aelan4 heartbeat[22009]: debug: notify_world: setting SIGCHLD Handler to SIG_DFL
Jun 9 09:37:21 aelan4 heartbeat[14530]: info: killing status process group 22009 with signal 15
Jun 9 09:37:21 aelan4 heartbeat[22010]: debug: notify_world: setting SIGCHLD Handler to SIG_DFL
heartbeat: 2004/06/09_09:37:21 info: remote resource transition completed.
Jun 9 09:37:21 aelan4 heartbeat: info: Running /etc/ha.d/rc.d/status status
Jun 9 09:37:21 aelan4 heartbeat[14530]: info: remote resource transition completed.
heartbeat: 2004/06/09_09:37:22 info: killing HBFIFO process 14533 with signal 15
heartbeat: 2004/06/09_09:37:22 info: killing HBWRITE process 14534 with signal 15
heartbeat: 2004/06/09_09:37:22 info: killing HBREAD process 14535 with signal 15
heartbeat: 2004/06/09_09:37:22 info: Core process 14534 exited. 3 remaining
heartbeat: 2004/06/09_09:37:22 info: Core process 14535 exited. 2 remaining
heartbeat: 2004/06/09_09:37:22 info: Core process 14533 exited. 1 remaining
heartbeat: 2004/06/09_09:37:22 info: Heartbeat shutdown complete.
heartbeat: 2004/06/09_09:37:22 info: Heartbeat restart triggered.
heartbeat: 2004/06/09_09:37:22 info: Restarting heartbeat.
heartbeat: 2004/06/09_09:37:22 info: Performing heartbeat restart exec.
Jun 9 09:37:22 aelan4 heartbeat[14530]: info: killing HBFIFO process 14533 with signal 15
Jun 9 09:37:22 aelan4 heartbeat[14530]: info: killing HBWRITE process 14534 with signal 15
Jun 9 09:37:22 aelan4 heartbeat[14530]: info: killing HBREAD process 14535 with signal 15
Jun 9 09:37:22 aelan4 heartbeat[14530]: info: Core process 14534 exited. 3 remaining
Jun 9 09:37:22 aelan4 heartbeat[14530]: info: Core process 14535 exited. 2 remaining
Jun 9 09:37:22 aelan4 heartbeat[14530]: info: Core process 14533 exited. 1 remaining
Jun 9 09:37:22 aelan4 heartbeat[14530]: info: Heartbeat shutdown complete.
Jun 9 09:37:22 aelan4 heartbeat[14530]: info: Heartbeat restart triggered.
Jun 9 09:37:22 aelan4 heartbeat[14530]: info: Restarting heartbeat.
Jun 9 09:37:22 aelan4 heartbeat[14530]: info: Performing heartbeat restart exec.
heartbeat: 2004/06/09_09:37:24 info: **************************
heartbeat: 2004/06/09_09:37:24 info: Configuration validated. Starting heartbeat 1.3.0
heartbeat: 2004/06/09_09:37:24 info: heartbeat: version 1.3.0
Jun 9 09:37:24 aelan4 heartbeat[14530]: info: **************************
Jun 9 09:37:24 aelan4 heartbeat[14530]: info: Configuration validated. Starting heartbeat 1.3.0
heartbeat: 2004/06/09_09:37:25 info: Heartbeat generation: 235
heartbeat: 2004/06/09_09:37:25 info: ucast: write socket priority set to IPTOS_LOWDELAY on eth1
heartbeat: 2004/06/09_09:37:25 info: ucast: bound send socket to device: eth1
heartbeat: 2004/06/09_09:37:25 info: ucast: bound receive socket to device: eth1
heartbeat: 2004/06/09_09:37:25 info: ucast: started on port 2694 interface eth1 to 192.168.7.3
heartbeat: 2004/06/09_09:37:25 info: pid 22014 locked in memory.
heartbeat: 2004/06/09_09:37:25 info: Local status now set to: 'up'
Jun 9 09:37:24 aelan4 heartbeat[22014]: info: heartbeat: version 1.3.0
Jun 9 09:37:25 aelan4 heartbeat[22014]: info: Heartbeat generation: 235
Jun 9 09:37:25 aelan4 heartbeat[22014]: info: ucast: write socket priority set to IPTOS_LOWDELAY on eth1
Jun 9 09:37:25 aelan4 heartbeat[22014]: info: ucast: bound send socket to device: eth1
Jun 9 09:37:25 aelan4 heartbeat[22014]: info: ucast: bound receive socket to device: eth1
Jun 9 09:37:25 aelan4 heartbeat[22014]: info: ucast: started on port 2694 interface eth1 to 192.168.7.3
Jun 9 09:37:25 aelan4 heartbeat[22014]: info: pid 22014 locked in memory.
Jun 9 09:37:25 aelan4 heartbeat[22014]: info: Local status now set to: 'up'
heartbeat: 2004/06/09_09:37:26 info: pid 22016 locked in memory.
heartbeat: 2004/06/09_09:37:26 info: pid 22017 locked in memory.
heartbeat: 2004/06/09_09:37:26 info: pid 22018 locked in memory.
heartbeat: 2004/06/09_09:37:26 info: Link aelan3:eth1 up.
heartbeat: 2004/06/09_09:37:26 info: Status update for node aelan3: status active
heartbeat: 2004/06/09_09:37:26 info: Local status now set to: 'active'
Jun 9 09:37:26 aelan4 heartbeat[22016]: info: pid 22016 locked in memory.
heartbeat: 2004/06/09_09:37:26 info: Running /etc/ha.d/rc.d/status status
Jun 9 09:37:26 aelan4 heartbeat[22017]: info: pid 22017 locked in memory.
Jun 9 09:37:26 aelan4 heartbeat[22018]: info: pid 22018 locked in memory.
Jun 9 09:37:26 aelan4 heartbeat[22014]: info: Link aelan3:eth1 up.
Jun 9 09:37:26 aelan4 heartbeat[22014]: info: Status update for node aelan3: status active
Jun 9 09:37:26 aelan4 heartbeat[22019]: debug: notify_world: setting SIGCHLD Handler to SIG_DFL
Jun 9 09:37:26 aelan4 heartbeat[22014]: info: Local status now set to: 'active'
Jun 9 09:37:26 aelan4 heartbeat: info: Running /etc/ha.d/rc.d/status status
heartbeat: 2004/06/09_09:37:29 info: remote resource transition completed.
heartbeat: 2004/06/09_09:37:29 info: remote resource transition completed.
heartbeat: 2004/06/09_09:37:29 info: Local Resource acquisition completed. (none)
heartbeat: 2004/06/09_09:37:29 info: aelan3 wants to go standby [foreign]
Jun 9 09:37:29 aelan4 heartbeat[22014]: info: remote resource transition completed.
Jun 9 09:37:29 aelan4 heartbeat[22014]: info: remote resource transition completed.
Jun 9 09:37:29 aelan4 heartbeat[22014]: info: Local Resource acquisition completed. (none)
Jun 9 09:37:29 aelan4 heartbeat[22014]: info: aelan3 wants to go standby [foreign]
heartbeat: 2004/06/09_09:37:36 info: standby: acquire [foreign] resources from aelan3
heartbeat: 2004/06/09_09:37:36 info: acquire local HA resources (standby).
Jun 9 09:37:36 aelan4 heartbeat[22014]: info: standby: acquire [foreign] resources from aelan3
heartbeat: 2004/06/09_09:37:36 info: local HA resource acquisition completed (standby).
heartbeat: 2004/06/09_09:37:36 info: Standby resource acquisition done [foreign].
heartbeat: 2004/06/09_09:37:36 info: Initial resource acquisition complete (auto_failback)
heartbeat: 2004/06/09_09:37:36 info: remote resource transition completed.
Jun 9 09:37:36 aelan4 heartbeat[22023]: info: acquire local HA resources (standby).
Jun 9 09:37:36 aelan4 heartbeat[22023]: info: local HA resource acquisition completed (standby).
Jun 9 09:37:36 aelan4 heartbeat[22014]: info: Standby resource acquisition done [foreign].
Jun 9 09:37:36 aelan4 heartbeat[22014]: info: Initial resource acquisition complete (auto_failback)
Jun 9 09:37:36 aelan4 heartbeat[22014]: info: remote resource transition completed.
More information about the Linux-HA-Dev
mailing list