[Linux-HA] sticky resource status "transition" with hb 2.0.2
Guochun Shi
gshi at ncsa.uiuc.edu
Wed Oct 26 15:25:21 MDT 2005
The log looks ok to me.
What's the problem? It will hang if you try to shut it down? cl_status
give "transition" resource state?
-Guochun
Joachim Banzhaf wrote:
>Hi Andrew,
>
>Am Mittwoch, 26. Oktober 2005 20:22 schrieb Andrew Beekhof:
>
>
>>On 10/26/05, joachimbanzhaf at compuserve.de <joachimbanzhaf at compuserve.de>
>>
>>
>wrote:
>
>
>
>>>Is it a known bug that starting heartbeat in an environment like mine
>>>never leaves resource status "transition" (which means e.g. you cannot
>>>stop heartbeat)? If not, I'm happy to provide more details.
>>>
>>>
>>That is definitely not a good thing. Can you attach logs and the
>>contents of the CIB (either the cib.xml file or output from cibadmin
>>-Ql)
>>
>>
>
>Sure, but I dont use cib right now. At this point it is a 1.x compatibility
>setup because I wanted to start from a working setup I know and understand
>before I switch to new 2.x functionality. Logs and config are attached.
>
>
>
>>If its at all possible, you might want to try the latest from CVS too.
>>
>>
>
>if I find the time, I will.
>
>
>
>>>I use heartbeat rpm 2.0.2 with a minimal SuSE Pro 9.3 (with current you
>>>updates).
>>>I have setup just one node so far.
>>>haresources has two lines for two nodes, each starting one ip address.
>>>Both ip adresses get started by heartbeat just fine.
>>>
>>>I have setup various 0.x up to 1.2.3 heartbeat clusters, even sent some
>>>
>>>
>
>oops, seems this got lost somehow:
>...patches but this is my first attempt on 2.x.
>
>regards
>
>Joachim Banzhaf
>
>
>------------------------------------------------------------------------
>
>debugfile /var/log/ha-debug
>logfile /var/log/ha-log
>logfacility local0
>keepalive 2
>deadtime 10
>warntime 5
>initdead 20
>#udpport 694
>baud 115200
>serial /dev/ttyS0 # Linux
>bcast eth1 eth2 # Linux
>ucast eth0 192.168.111.101
>ucast eth0 192.168.111.102
>auto_failback off
>#stonith baytech /etc/ha.d/conf/stonith.baytech
>#stonith_host * baytech 10.0.0.3 mylogin mysecretpassword
>#stonith_host ken3 rps10 /dev/ttyS1 kathy 0
>#stonith_host kathy rps10 /dev/ttyS1 ken3 0
># wish to load the module with the parameter "nowayout=0" or
># compile it without CONFIG_WATCHDOG_NOWAYOUT set. Otherwise even
>#watchdog /dev/watchdog
>node jobc1
>node jobc2
>#respawn hacluster /usr/lib/heartbeat/ipfail
>#hopfudge 1
>#deadping 30
>#hbgenmethod time
>#realtime off
>#debug 1
># ipfail (uid=HA_CCMUSER)
># ccm (uid=HA_CCMUSER)
># ping (gid=HA_APIGROUP)
># cl_status (gid=HA_APIGROUP)
>apiauth ipfail uid=hacluster
>apiauth ccm uid=hacluster
>apiauth cms uid=hacluster
>apiauth ping gid=haclient uid=root
>apiauth default gid=haclient
>msgfmt netstring
># daemon (the default is /etc/logd.cf)
>use_logd yes
>#conn_logd_time 60
>compression bz2
>compression_threshold 2
>
>
>
>------------------------------------------------------------------------
>
>debugfile /var/log/ha-debug
>logfile /var/log/ha-log
>logfacility local7
>entity logd
>useapphbd no
>sendqlen 256
>recvqlen 256
>
>
>
>------------------------------------------------------------------------
>
>logd[6744]: 2005/10/26_22:13:38 info: logd started with /etc/logd.cf.
>logd[6744]: 2005/10/26_22:13:38 WARN: Core dumps could be lost if multiple dumps occur
>logd[6744]: 2005/10/26_22:13:38 WARN: Consider setting /proc/sys/kernel/core_uses_pid (or equivalent) to 1 for maximum supportability
>logd[6745]: 2005/10/26_22:13:38 info: G_main_add_SignalHandler: Added signal handler for signal 15
>logd[6744]: 2005/10/26_22:13:38 info: G_main_add_SignalHandler: Added signal handler for signal 15
>heartbeat[6820]: 2005/10/26_22:13:38 info: Enabling logging daemon
>heartbeat[6820]: 2005/10/26_22:13:38 info: logfile and debug file are those specifiedin logd config file (default /etc/logd.cf)
>heartbeat[6820]: 2005/10/26_22:13:38 WARN: Core dumps could be lost if multiple dumps occur
>heartbeat[6820]: 2005/10/26_22:13:38 WARN: Consider setting /proc/sys/kernel/core_uses_pid (or equivalent) to 1 for maximum supportability
>heartbeat[6820]: 2005/10/26_22:13:38 info: **************************
>heartbeat[6820]: 2005/10/26_22:13:38 info: Configuration validated. Starting heartbeat 2.0.2
>heartbeat[6821]: 2005/10/26_22:13:38 info: heartbeat: version 2.0.2
>heartbeat[6821]: 2005/10/26_22:13:38 info: Heartbeat generation: 11
>heartbeat[6821]: 2005/10/26_22:13:38 info: Removing /var/run/heartbeat/rsctmp failed, recreating.
>heartbeat[6821]: 2005/10/26_22:13:39 info: glib: Starting serial heartbeat on tty /dev/ttyS0 (115200 baud)
>heartbeat[6821]: 2005/10/26_22:13:39 info: glib: UDP Broadcast heartbeat started on port 694 (694) interface eth1
>heartbeat[6821]: 2005/10/26_22:13:39 info: glib: UDP Broadcast heartbeat started on port 694 (694) interface eth2
>heartbeat[6821]: 2005/10/26_22:13:39 info: glib: ucast: write socket priority set to IPTOS_LOWDELAY on eth0
>heartbeat[6821]: 2005/10/26_22:13:39 info: glib: ucast: bound send socket to device: eth0
>heartbeat[6821]: 2005/10/26_22:13:39 info: glib: ucast: bound receive socket to device: eth0
>heartbeat[6821]: 2005/10/26_22:13:39 info: glib: ucast: started on port 694 interface eth0 to 192.168.111.101
>heartbeat[6821]: 2005/10/26_22:13:39 info: glib: ucast: write socket priority set to IPTOS_LOWDELAY on eth0
>heartbeat[6821]: 2005/10/26_22:13:39 info: glib: ucast: bound send socket to device: eth0
>heartbeat[6821]: 2005/10/26_22:13:39 info: glib: ucast: bound receive socket to device: eth0
>heartbeat[6821]: 2005/10/26_22:13:39 info: glib: ucast: started on port 694 interface eth0 to 192.168.111.102
>heartbeat[6821]: 2005/10/26_22:13:39 info: G_main_add_SignalHandler: Added signal handler for signal 17
>heartbeat[6821]: 2005/10/26_22:13:39 info: pid 6821 locked in memory.
>heartbeat[6821]: 2005/10/26_22:13:39 info: Local status now set to: 'up'
>heartbeat[6828]: 2005/10/26_22:13:39 info: pid 6828 locked in memory.
>heartbeat[6824]: 2005/10/26_22:13:40 info: pid 6824 locked in memory.
>heartbeat[6825]: 2005/10/26_22:13:40 info: pid 6825 locked in memory.
>heartbeat[6826]: 2005/10/26_22:13:40 info: pid 6826 locked in memory.
>heartbeat[6827]: 2005/10/26_22:13:40 info: pid 6827 locked in memory.
>heartbeat[6821]: 2005/10/26_22:13:40 info: Link jobc1:eth1 up.
>heartbeat[6829]: 2005/10/26_22:13:40 info: pid 6829 locked in memory.
>heartbeat[6830]: 2005/10/26_22:13:40 info: pid 6830 locked in memory.
>heartbeat[6821]: 2005/10/26_22:13:40 info: Link jobc1:eth2 up.
>heartbeat[6831]: 2005/10/26_22:13:40 info: pid 6831 locked in memory.
>heartbeat[6832]: 2005/10/26_22:13:40 info: pid 6832 locked in memory.
>heartbeat[6833]: 2005/10/26_22:13:40 info: pid 6833 locked in memory.
>heartbeat[6834]: 2005/10/26_22:13:40 info: pid 6834 locked in memory.
>heartbeat[6821]: 2005/10/26_22:13:59 WARN: node jobc2: is dead
>heartbeat[6821]: 2005/10/26_22:13:59 info: Local status now set to: 'active'
>heartbeat[6821]: 2005/10/26_22:13:59 WARN: No STONITH device configured.
>heartbeat[6821]: 2005/10/26_22:13:59 WARN: Shared disks are not protected.
>heartbeat[6821]: 2005/10/26_22:13:59 info: Resources being acquired from jobc2.
>heartbeat[6850]: 2005/10/26_22:13:59 debug: notify_world: setting SIGCHLD Handler to SIG_DFL
>harc[6850][6854]: 2005/10/26_22:13:59 info: Running /etc/ha.d/rc.d/status status
>mach_down[6858][6881]: 2005/10/26_22:13:59 info: Taking over resource group 192.168.111.202
>ResourceManager[6899][6907]: 2005/10/26_22:13:59 info: Acquiring resource group: jobc2 192.168.111.202
>ResourceManager[6899][6952]: 2005/10/26_22:13:59 info: Running /etc/ha.d/resource.d/IPaddr 192.168.111.202 start
>ResourceManager[6899][6953]: 2005/10/26_22:13:59 debug: Starting /etc/ha.d/resource.d/IPaddr 192.168.111.202 start
>heartbeat[6821]: 2005/10/26_22:13:59 debug: StartNextRemoteRscReq(): child count 2
>heartbeat[6851]: 2005/10/26_22:13:59 info: Local Resource acquisition completed.
>heartbeat[6821]: 2005/10/26_22:13:59 info: Initial resource acquisition complete (T_RESOURCES(us))
>heartbeat[6821]: 2005/10/26_22:13:59 debug: StartNextRemoteRscReq(): child count 1
>ls: /var/run/heartbeat/rsctmp/IPaddr/eth0:*: No such file or directory
>IPaddr[6954][7009]: 2005/10/26_22:14:00 info: /sbin/ifconfig eth0:0 192.168.111.202 netmask 255.255.255.0 broadcast 192.168.111.255
>IPaddr[6954][7014]: 2005/10/26_22:14:00 info: Sending Gratuitous Arp for 192.168.111.202 on eth0:0 [eth0]
>IPaddr[6954][7015]: 2005/10/26_22:14:00 /usr/lib/heartbeat/send_arp -i 500 -r 10 -p /var/run/heartbeat/rsctmp/send_arp/send_arp-192.168.111.202 eth0 192.168.111.202 auto 192.168.111.202 ffffffffffff
>ResourceManager[6899][7019]: 2005/10/26_22:14:00 debug: /etc/ha.d/resource.d/IPaddr 192.168.111.202 start done. RC=0
>mach_down[6858][7020]: 2005/10/26_22:14:00 info: /usr/lib/heartbeat/mach_down: nice_failback: foreign resources acquired
>mach_down[6858][7023]: 2005/10/26_22:14:00 info: mach_down takeover complete for node jobc2.
>heartbeat[7024]: 2005/10/26_22:14:00 debug: notify_world: setting SIGCHLD Handler to SIG_DFL
>harc[7024][7027]: 2005/10/26_22:14:00 info: Running /etc/ha.d/rc.d/ip-request-resp ip-request-resp
>ip-request-resp[7024][7030]: 2005/10/26_22:14:00 received ip-request-resp 192.168.111.201 OK yes
>send_arp[7018]: 2005/10/26_22:14:00 info: Enable using logging daemon
>ResourceManager[7031][7041]: 2005/10/26_22:14:00 info: Acquiring resource group: jobc1 192.168.111.201
>ResourceManager[7031][7082]: 2005/10/26_22:14:00 info: Running /etc/ha.d/resource.d/IPaddr 192.168.111.201 start
>ResourceManager[7031][7083]: 2005/10/26_22:14:00 debug: Starting /etc/ha.d/resource.d/IPaddr 192.168.111.201 start
>IPaddr[7084][7132]: 2005/10/26_22:14:00 info: /sbin/ifconfig eth0:1 192.168.111.201 netmask 255.255.255.0 broadcast 192.168.111.255
>IPaddr[7084][7137]: 2005/10/26_22:14:00 info: Sending Gratuitous Arp for 192.168.111.201 on eth0:1 [eth0]
>IPaddr[7084][7138]: 2005/10/26_22:14:00 /usr/lib/heartbeat/send_arp -i 500 -r 10 -p /var/run/heartbeat/rsctmp/send_arp/send_arp-192.168.111.201 eth0 192.168.111.201 auto 192.168.111.201 ffffffffffff
>send_arp[7141]: 2005/10/26_22:14:00 info: Enable using logging daemon
>ResourceManager[7031][7142]: 2005/10/26_22:14:00 debug: /etc/ha.d/resource.d/IPaddr 192.168.111.201 start done. RC=0
>heartbeat[6821]: 2005/10/26_22:14:10 info: Local Resource acquisition completed. (none)
>heartbeat[6821]: 2005/10/26_22:14:10 info: local resource transition completed.
>heartbeat[6821]: 2005/10/26_22:14:28 debug: SO_PEERCRED returned [7168, (0:0)]
>heartbeat[6821]: 2005/10/26_22:14:28 debug: Verifying authentication: cred.uid=0 cred.gid=90
>heartbeat[6821]: 2005/10/26_22:14:28 debug: Verifying authentication: uidptr=0x8160fb0 gidptr=0x0
>heartbeat[6821]: 2005/10/26_22:14:28 debug: SO_PEERCRED returned [7168, (0:0)]
>heartbeat[6821]: 2005/10/26_22:14:28 debug: Verifying authentication: cred.uid=0 cred.gid=90
>heartbeat[6821]: 2005/10/26_22:14:28 debug: Verifying authentication: uidptr=0x0 gidptr=0x815edf0
>heartbeat[6821]: 2005/10/26_22:14:28 debug: SO_PEERCRED returned [7168, (0:0)]
>heartbeat[6821]: 2005/10/26_22:14:28 debug: Verifying authentication: cred.uid=0 cred.gid=90
>heartbeat[6821]: 2005/10/26_22:14:28 debug: Verifying authentication: uidptr=0x0 gidptr=0x81482a8
>
>
>------------------------------------------------------------------------
>
>logd[6744]: 2005/10/26_22:13:38 info: logd started with /etc/logd.cf.
>logd[6744]: 2005/10/26_22:13:38 WARN: Core dumps could be lost if multiple dumps occur
>logd[6744]: 2005/10/26_22:13:38 WARN: Consider setting /proc/sys/kernel/core_uses_pid (or equivalent) to 1 for maximum supportability
>logd[6745]: 2005/10/26_22:13:38 info: G_main_add_SignalHandler: Added signal handler for signal 15
>logd[6744]: 2005/10/26_22:13:38 info: G_main_add_SignalHandler: Added signal handler for signal 15
>heartbeat[6820]: 2005/10/26_22:13:38 info: Enabling logging daemon
>heartbeat[6820]: 2005/10/26_22:13:38 info: logfile and debug file are those specifiedin logd config file (default /etc/logd.cf)
>heartbeat[6820]: 2005/10/26_22:13:38 WARN: Core dumps could be lost if multiple dumps occur
>heartbeat[6820]: 2005/10/26_22:13:38 WARN: Consider setting /proc/sys/kernel/core_uses_pid (or equivalent) to 1 for maximum supportability
>heartbeat[6820]: 2005/10/26_22:13:38 info: **************************
>heartbeat[6820]: 2005/10/26_22:13:38 info: Configuration validated. Starting heartbeat 2.0.2
>heartbeat[6821]: 2005/10/26_22:13:38 info: heartbeat: version 2.0.2
>heartbeat[6821]: 2005/10/26_22:13:38 info: Heartbeat generation: 11
>heartbeat[6821]: 2005/10/26_22:13:38 info: Removing /var/run/heartbeat/rsctmp failed, recreating.
>heartbeat[6821]: 2005/10/26_22:13:39 info: glib: Starting serial heartbeat on tty /dev/ttyS0 (115200 baud)
>heartbeat[6821]: 2005/10/26_22:13:39 info: glib: UDP Broadcast heartbeat started on port 694 (694) interface eth1
>heartbeat[6821]: 2005/10/26_22:13:39 info: glib: UDP Broadcast heartbeat started on port 694 (694) interface eth2
>heartbeat[6821]: 2005/10/26_22:13:39 info: glib: ucast: write socket priority set to IPTOS_LOWDELAY on eth0
>heartbeat[6821]: 2005/10/26_22:13:39 info: glib: ucast: bound send socket to device: eth0
>heartbeat[6821]: 2005/10/26_22:13:39 info: glib: ucast: bound receive socket to device: eth0
>heartbeat[6821]: 2005/10/26_22:13:39 info: glib: ucast: started on port 694 interface eth0 to 192.168.111.101
>heartbeat[6821]: 2005/10/26_22:13:39 info: glib: ucast: write socket priority set to IPTOS_LOWDELAY on eth0
>heartbeat[6821]: 2005/10/26_22:13:39 info: glib: ucast: bound send socket to device: eth0
>heartbeat[6821]: 2005/10/26_22:13:39 info: glib: ucast: bound receive socket to device: eth0
>heartbeat[6821]: 2005/10/26_22:13:39 info: glib: ucast: started on port 694 interface eth0 to 192.168.111.102
>heartbeat[6821]: 2005/10/26_22:13:39 info: G_main_add_SignalHandler: Added signal handler for signal 17
>heartbeat[6821]: 2005/10/26_22:13:39 info: pid 6821 locked in memory.
>heartbeat[6821]: 2005/10/26_22:13:39 info: Local status now set to: 'up'
>heartbeat[6828]: 2005/10/26_22:13:39 info: pid 6828 locked in memory.
>heartbeat[6824]: 2005/10/26_22:13:40 info: pid 6824 locked in memory.
>heartbeat[6825]: 2005/10/26_22:13:40 info: pid 6825 locked in memory.
>heartbeat[6826]: 2005/10/26_22:13:40 info: pid 6826 locked in memory.
>heartbeat[6827]: 2005/10/26_22:13:40 info: pid 6827 locked in memory.
>heartbeat[6821]: 2005/10/26_22:13:40 info: Link jobc1:eth1 up.
>heartbeat[6829]: 2005/10/26_22:13:40 info: pid 6829 locked in memory.
>heartbeat[6830]: 2005/10/26_22:13:40 info: pid 6830 locked in memory.
>heartbeat[6821]: 2005/10/26_22:13:40 info: Link jobc1:eth2 up.
>heartbeat[6831]: 2005/10/26_22:13:40 info: pid 6831 locked in memory.
>heartbeat[6832]: 2005/10/26_22:13:40 info: pid 6832 locked in memory.
>heartbeat[6833]: 2005/10/26_22:13:40 info: pid 6833 locked in memory.
>heartbeat[6834]: 2005/10/26_22:13:40 info: pid 6834 locked in memory.
>heartbeat[6821]: 2005/10/26_22:13:59 WARN: node jobc2: is dead
>heartbeat[6821]: 2005/10/26_22:13:59 info: Local status now set to: 'active'
>heartbeat[6821]: 2005/10/26_22:13:59 WARN: No STONITH device configured.
>heartbeat[6821]: 2005/10/26_22:13:59 WARN: Shared disks are not protected.
>heartbeat[6821]: 2005/10/26_22:13:59 info: Resources being acquired from jobc2.
>harc[6850][6854]: 2005/10/26_22:13:59 info: Running /etc/ha.d/rc.d/status status
>mach_down[6858][6881]: 2005/10/26_22:13:59 info: Taking over resource group 192.168.111.202
>ResourceManager[6899][6907]: 2005/10/26_22:13:59 info: Acquiring resource group: jobc2 192.168.111.202
>ResourceManager[6899][6952]: 2005/10/26_22:13:59 info: Running /etc/ha.d/resource.d/IPaddr 192.168.111.202 start
>heartbeat[6851]: 2005/10/26_22:13:59 info: Local Resource acquisition completed.
>heartbeat[6821]: 2005/10/26_22:13:59 info: Initial resource acquisition complete (T_RESOURCES(us))
>IPaddr[6954][7009]: 2005/10/26_22:14:00 info: /sbin/ifconfig eth0:0 192.168.111.202 netmask 255.255.255.0 broadcast 192.168.111.255
>IPaddr[6954][7014]: 2005/10/26_22:14:00 info: Sending Gratuitous Arp for 192.168.111.202 on eth0:0 [eth0]
>IPaddr[6954][7015]: 2005/10/26_22:14:00 /usr/lib/heartbeat/send_arp -i 500 -r 10 -p /var/run/heartbeat/rsctmp/send_arp/send_arp-192.168.111.202 eth0 192.168.111.202 auto 192.168.111.202 ffffffffffff
>mach_down[6858][7020]: 2005/10/26_22:14:00 info: /usr/lib/heartbeat/mach_down: nice_failback: foreign resources acquired
>mach_down[6858][7023]: 2005/10/26_22:14:00 info: mach_down takeover complete for node jobc2.
>harc[7024][7027]: 2005/10/26_22:14:00 info: Running /etc/ha.d/rc.d/ip-request-resp ip-request-resp
>ip-request-resp[7024][7030]: 2005/10/26_22:14:00 received ip-request-resp 192.168.111.201 OK yes
>send_arp[7018]: 2005/10/26_22:14:00 info: Enable using logging daemon
>ResourceManager[7031][7041]: 2005/10/26_22:14:00 info: Acquiring resource group: jobc1 192.168.111.201
>ResourceManager[7031][7082]: 2005/10/26_22:14:00 info: Running /etc/ha.d/resource.d/IPaddr 192.168.111.201 start
>IPaddr[7084][7132]: 2005/10/26_22:14:00 info: /sbin/ifconfig eth0:1 192.168.111.201 netmask 255.255.255.0 broadcast 192.168.111.255
>IPaddr[7084][7137]: 2005/10/26_22:14:00 info: Sending Gratuitous Arp for 192.168.111.201 on eth0:1 [eth0]
>IPaddr[7084][7138]: 2005/10/26_22:14:00 /usr/lib/heartbeat/send_arp -i 500 -r 10 -p /var/run/heartbeat/rsctmp/send_arp/send_arp-192.168.111.201 eth0 192.168.111.201 auto 192.168.111.201 ffffffffffff
>send_arp[7141]: 2005/10/26_22:14:00 info: Enable using logging daemon
>heartbeat[6821]: 2005/10/26_22:14:10 info: Local Resource acquisition completed. (none)
>heartbeat[6821]: 2005/10/26_22:14:10 info: local resource transition completed.
>
>
>------------------------------------------------------------------------
>
># jobc1 192.168.111.201 drbddisk::service1 Filesystem::/dev/drbd1::/ha/service1::reiserfs
>jobc1 192.168.111.201
># jobc2 192.168.111.202 drbddisk::service2 Filesystem::/dev/drbd2::/ha/service2::reiserfs
>jobc2 192.168.111.202
>
>
>------------------------------------------------------------------------
>
>_______________________________________________
>Linux-HA mailing list
>Linux-HA at lists.linux-ha.org
>http://lists.linux-ha.org/mailman/listinfo/linux-ha
>See also: http://linux-ha.org/ReportingProblems
>
More information about the Linux-HA
mailing list