[Linux-HA] Heartbeat starts services twice, why?
"rueh hänä"
rueh at gmx.net
Wed Mar 29 09:35:21 MST 2006
Hi folks
I catched up a problem with my heartbeat cluster with
- FC4 Kernel 2.6.11-1.1369_FC4smp
- heartbeat-2.0.0-1
- drbd-0.7.15
- tomcat5-5.0.30-5jpp_6fc
- httpd-2.0.54-10.2
- mon-0.99.2-1.rhfc1.dag
I think the problem is relating to mon, but i dont think, that mon is the
problem. It just seems to be a service, that couldnt be started twice,
because of the port, that can only be used once. I was testing a scenario,
where i kill heartbeat on master, so that slave can take over. After that, i
started master again and wanted him to takeover the resources from slave.
Ive done a simple /etc/init.d/heartbeat stop on slave.
Here is the log:
Mar 29 18:03:57 linuxweb21 heartbeat: [23013]: info: acquire all HA
resources (standby).
Mar 29 18:03:57 linuxweb21 ResourceManager[23033]: [23047]: info: Acquiring
resource group: linuxweb21.ten.ch 192.168.0.89 drbddisk::drbd
Filesystem::/dev
Mar 29 18:03:58 linuxweb21 heartbeat: [23014]: info: Local Resource
acquisition completed.
Mar 29 18:03:58 linuxweb21 ResourceManager[23033]: [23109]: info: Running
/etc/ha.d/resource.d/IPaddr 192.168.0.89 start
Mar 29 18:03:58 linuxweb21 IPaddr[23111]: [23160]: info: /sbin/ifconfig
eth0:0 192.168.0.89 netmask 255.255.255.0 broadcast 192.168.0.255
Mar 29 18:03:58 linuxweb21 IPaddr[23111]: [23165]: info: Sending Gratuitous
Arp for 192.168.0.89 on eth0:0 [eth0]
Mar 29 18:03:58 linuxweb21 IPaddr[23111]: [23166]:
/usr/lib64/heartbeat/send_arp -i 500 -r 10 -p
/var/lib/heartbeat/rsctmp/send_arp/send_arp-192.168.0.89
Mar 29 18:03:58 linuxweb21 ResourceManager[23033]: [23199]: info: Running
/etc/ha.d/resource.d/drbddisk drbd start
Mar 29 18:03:58 linuxweb21 kernel: drbd0: Secondary/Secondary -->
Primary/Secondary
Mar 29 18:03:58 linuxweb21 ResourceManager[23033]: [23240]: info: Running
/etc/ha.d/resource.d/Filesystem /dev/drbd0 /drbd xfs start
Mar 29 18:03:59 linuxweb21 kernel: XFS mounting filesystem drbd0
Mar 29 18:03:59 linuxweb21 ResourceManager[23033]: [23280]: info: Running
/etc/ha.d/resource.d/ipalias start
Mar 29 18:03:59 linuxweb21 ResourceManager[23033]: [23315]: info: Running
/etc/ha.d/resource.d/httpd start
Mar 29 18:04:01 linuxweb21 ResourceManager[23033]: [23342]: info: Running
/etc/ha.d/resource.d/tomcat5 start
Mar 29 18:04:03 linuxweb21 ResourceManager[23033]: [24070]: info: Running
/etc/ha.d/resource.d/mysqld start
Mar 29 18:04:05 linuxweb21 ResourceManager[23033]: [24198]: info: Running
/etc/ha.d/resource.d/mon start
Mar 29 18:04:06 linuxweb21 mon[24205]: running as daemon
Mar 29 18:04:06 linuxweb21 mon[24205]: mon server started
Mar 29 18:04:06 linuxweb21 ResourceManager[23033]: [24223]: info: Running
/etc/ha.d/resource.d/moncheck start
Mar 29 18:04:06 linuxweb21 heartbeat: [23013]: info: all HA resource
acquisition completed (standby).
Mar 29 18:04:06 linuxweb21 heartbeat: [22996]: info: Standby resource
acquisition done [all].
Mar 29 18:04:06 linuxweb21 harc[24236]: [24239]: info: Running
/etc/ha.d/rc.d/status status
#### Somewhere around here, the log messages are normally finished. In ####
fact, after starting all services. But after that, i get more logs #### and
it seems, that heartbeat starts the services again.
Mar 29 18:04:07 linuxweb21 harc[24261]: [24264]: info: Running
/etc/ha.d/rc.d/ip-request-resp ip-request-resp
Mar 29 18:04:07 linuxweb21 ip-request-resp[24261]: [24267]: received
ip-request-resp 192.168.0.89 OK yes
Mar 29 18:04:07 linuxweb21 ResourceManager[24268]: [24276]: info: Acquiring
resource group: linuxweb21.ten.ch 192.168.0.89 drbddisk::drbd
Filesystem::/dev
Mar 29 18:04:07 linuxweb21 ResourceManager[24268]: [24352]: info: Running
/etc/ha.d/resource.d/ipalias start
Mar 29 18:04:07 linuxweb21 ResourceManager[24268]: [24386]: info: Running
/etc/ha.d/resource.d/httpd start
Mar 29 18:04:08 linuxweb21 ResourceManager[24268]: [24411]: info: Running
/etc/ha.d/resource.d/tomcat5 start
Mar 29 18:04:08 linuxweb21 ResourceManager[24268]: [24484]: info: Running
/etc/ha.d/resource.d/mysqld start
Mar 29 18:04:09 linuxweb21 ResourceManager[24268]: [24600]: info: Running
/etc/ha.d/resource.d/mon start
## AND here is the problem. mon couldnt be started a second time, because ##
the port is already in use. And because of this, heartbeat shuts down ## the
services and runs further without them. Now i have a non running
## cluster.
Mar 29 18:04:09 linuxweb21 mon[24605]: fatal, could not bind TCP server port
2583: Die Adresse wird bereits verwendet
Mar 29 18:04:09 linuxweb21 ResourceManager[24268]: [24609]: ERROR: Return
code 1 from /etc/ha.d/resource.d/mon
Mar 29 18:04:09 linuxweb21 ResourceManager[24268]: [24610]: CRIT: Giving up
resources due to failure of mon
Mar 29 18:04:09 linuxweb21 ResourceManager[24268]: [24611]: info: Releasing
resource group: linuxweb21.ten.ch 192.168.0.89 drbddisk::drbd
Filesystem::/dev
Mar 29 18:04:10 linuxweb21 ResourceManager[24268]: [24621]: info: Running
/etc/ha.d/resource.d/moncheck stop
Mar 29 18:04:10 linuxweb21 ResourceManager[24268]: [24649]: info: Running
/etc/ha.d/resource.d/mon stop
Mar 29 18:04:10 linuxweb21 mon[24205]: caught TERM signal, exiting
Mar 29 18:04:10 linuxweb21 ResourceManager[24268]: [24668]: info: Running
/etc/ha.d/resource.d/mysqld stop
Mar 29 18:04:13 linuxweb21 ResourceManager[24268]: [24754]: info: Running
/etc/ha.d/resource.d/tomcat5 stop
Mar 29 18:04:18 linuxweb21 ResourceManager[24268]: [24890]: info: Running
/etc/ha.d/resource.d/httpd stop
Mar 29 18:04:22 linuxweb21 ResourceManager[24268]: [24909]: info: Running
/etc/ha.d/resource.d/ipalias stop
Mar 29 18:04:22 linuxweb21 ResourceManager[24268]: [24937]: info: Running
/etc/ha.d/resource.d/Filesystem /dev/drbd0 /drbd xfs stop
Mar 29 18:04:23 linuxweb21 ResourceManager[24268]: [24965]: info: Running
/etc/ha.d/resource.d/drbddisk drbd stop
Mar 29 18:04:23 linuxweb21 kernel: drbd0: Primary/Secondary -->
Secondary/Secondary
Mar 29 18:04:23 linuxweb21 ResourceManager[24268]: [24991]: info: Running
/etc/ha.d/resource.d/IPaddr 192.168.0.89 stop
Mar 29 18:04:24 linuxweb21 IPaddr[24993]: [25004]: info: /sbin/route -n del
-host 192.168.0.89
Mar 29 18:04:24 linuxweb21 IPaddr[24993]: [25006]: info: /sbin/ifconfig
eth0:0 down
Mar 29 18:04:24 linuxweb21 IPaddr[24993]: [25009]: info: IP Address
192.168.0.89 released
My question. Is this normal? If not, what could be the cause? Is it even
mon, that produces this problem, or even not?
Here is my ha.cf (i diffed slave's and master's before):
keepalive 2
deadtime 60
warntime 30
udpport 694
bcast eth1 # Linux
auto_failback off
watchdog /dev/watchdog
node linuxweb21
node linuxweb22
debug 0
use_logd yes
conn_logd_time 60
Here is my haresourcse:
linuxweb21 192.168.0.89 drbddisk::drbd Filesystem::/dev/drbd0::/drbd::xfs
ipalias httpd tomcat5 mysqld mon moncheck
Permissions are ok, the start scripts in resource.d / init.d are the same (i
also diffed them).
--
Echte DSL-Flatrate dauerhaft für 0,- Euro*!
"Feel free" mit GMX DSL! http://www.gmx.net/de/go/dsl
More information about the Linux-HA
mailing list