[LinuxFailSafe] failsafe do not failover ?

cqcai cqcai@163.com
Thu, 20 Mar 2003 13:58:39 +0800


Hi  everybody,  I am a failsafe newbie and want to install and try  
linux failsafe on redhat 7.3. Now linux failsafe have been installed 
successful and finished configured. But the simple 2 node HA system does 
not failover at all.

Can anyone help me about that?

My installation process under redhat 7.3 is below:

untar  failsafe_books-1.0.1_CVS20010406-0.src.rpm

check the compat-libstdc++-6.2-2.9.0.16  have been installed
install IBMJava118-JRE-1.1.8-3.0
install sysadm_base-client-1.3.7-1
install sysadm_base-dev-1.3.7-1
install sysadm_base-lib-1.3.7-1
install heartbeat-stonith-0.4.9-2

make rpm

fsinstall

cdbreinit

run  2node.script as below:

define node qcai
set hostname to qcai.wiregate.com
set nodeid to 101
set sysctrl_type to null
set sysctrl_status to disabled
set sysctrl_password to none
set sysctrl_owner to webserver
set sysctrl_device to /dev/ttyS0
set sysctrl_owner_type to tty
add nic 192.168.0.143
    set heartbeat to true
    set ctrl_msgs to true
    set priority to 1
done
done


define node webserver
set hostname to webserver.wiregate.com
set nodeid to 102
set sysctrl_type to null
set sysctrl_status to disabled
set sysctrl_password to none
set sysctrl_owner to  qcai
set sysctrl_device to /dev/ttyS0
set sysctrl_owner_type to tty
add nic 192.168.0.13
    set heartbeat to true
    set ctrl_msgs to true
    set priority to 1
done
done

define cluster test
set notify_cmd to /usr/bin/mail
set notify_addr to root@qcai.wiregate.com root@webserver.wiregate.com
add node qcai
add node webserver
done


define failover_policy qcai-primary
set attribute to Auto_Failback
set attribute to Auto_Recovery
set script to ordered
set domain to qcai webserver
done


define failover_policy webserver-primary
set attribute to Auto_Failback
set attribute to InPlace_Recovery
set script to ordered
set domain to webserver qcai
done


define resource 192.168.1.143 of resource_type IP_address in cluster test
set NetworkMask to 0xffffff00
set interfaces to eth0:0
set BroadcastAddress to 192.168.1.255
done


define resource 192.168.1.13 of resource_type IP_address in cluster test
set NetworkMask to 0xffffff00
set interfaces to eth0:0
set BroadcastAddress to 192.168.1.255
done


define resource_group qcai-group in cluster test
set failover_policy to qcai-primary
add resource 192.168.1.143 of resource_type IP_address
done


define resource_group webserver-group in cluster test
set failover_policy to webserver-primary
add resource 192.168.1.13 of resource_type IP_address
done

quit


And script  process fine.  and the other node received the new 
configuration.
But when i perform  ifconfig eth0:0 192.168.1.143
                            and ifconfig eth0:0 down,  
the network interface eth0:0 won't up automatically
the failsafe has no response at all. even the log.
Here is some information in the log file:

tail of cad.log:
Thu Mar 20 13:37:36.009 <cad 20846:7176> cfs_fs_connect: fs_cam_register 
failed with error FailSafe is not ready to accept admin requests.
Thu Mar 20 13:37:41.104 <cam_casmail 20837:1024> Notification cmd exited 
with 32512.
Thu Mar 20 13:37:46.109 <cad 20846:7176> cfs_fs_connect: fs_cam_register 
failed with error FailSafe is not ready to accept admin requests.
Thu Mar 20 13:37:56.209 <cad 20846:7176> cfs_fs_connect: fs_cam_register 
failed with error FailSafe is not ready to accept admin requests.

tail of cli_qcai:
        * * * L o g g i n g    R e s t a r t e d * * *

Thu Mar 20 11:49:40.832 <N resgroupAddResources config 21236:0 
ci_config_cdb.c:208> /usr/lib/sysadm/privbin/resgroupAddResources 
_CDB_DB=/var/lib/failsafe/cdb/cdb.db _RESOURCE_GROUP=webserver-group 
_CLUSTER=test _RESOURCE_0=192.168.1.13 _RESOURCE_TYPE_0=IP_address 
_NUM_RESOURCES=1
Thu Mar 20 11:49:40.840 <W resgroupAddResources config 21236:0 
ci_config_resources.c:1819> CI_FAILURE, resource dependency 
(#cluster#test#HA#resources#IP_address#192.168.1.13#_dependency) subtree 
is invalid
Thu Mar 20 11:49:40.840 <W resgroupAddResources config 21236:0 
ci_config_restypes.c:2362> CI_FAILURE, resource type  dependency 
(#cluster#test#HA#ResourceTypes#IP_address#_dependency) subtree is invalid
Thu Mar 20 11:49:40.873 <N resgroupAddResources config 21236:0 
ci_config_cdb.c:223> CLI private command: successful completion

crsd_qcai:
Thu Mar 20 11:49:03.223 <E crsd crs 20897:0 crs_config.c:1214> 
CI_ERR_HDL_STALE, Database root node not found.
Thu Mar 20 11:49:03.223 <W crsd crs 20897:0 crsd_config.c:249> 
CI_ERR_HDL_STALE, Could not read new config.
Thu Mar 20 11:49:04.057 <W crsd crs 20897:0 crsd_pending.c:490> 
CI_CRSERR_INVAL, The node specified for monitoring has its controlled 
port disabled. Ignoring this request.
Thu Mar 20 11:49:07.249 <W crsd crs 20897:0 crsd_pending.c:490> 
CI_CRSERR_INVAL, The node specified for monitoring has its controlled 
port disabled. Ignoring this request.
Thu Mar 20 11:49:10.849 <W crsd crs 20897:0 crsd_pending.c:490> 
CI_CRSERR_INVAL, The node specified for monitoring has its controlled 
port disabled. Ignoring this request.


cdbd_log:
Thu Mar 20 11:49:36.184 qcai cdbd  - Checking quorum with 2 members for 
any unknown members.
Thu Mar 20 11:49:36.184 qcai cdbd  - All quorum member machines are 
known to us.
Thu Mar 20 11:49:54.346 qcai cdbd  - CDB on node webserver (102) marked 
current in fs2d_copy_cdb_to_machine
Thu Mar 20 11:49:55.100 qcai cdbd  - Successfully copied global CDB to 
node webserver (239)


cmond_log:
Thu Mar 20 11:49:38.716 <cmond 20830:1024> <cmond_cdb.c:600> No action 
required for this notification.
Thu Mar 20 11:49:38.716 <cmond 20830:1024> <cmond_cdb.c:186> Serving 
notification #0 key = 
#cluster#test#HA#ResourceGroups#webserver-group#FailoverPolicy.
Thu Mar 20 11:49:38.716 <cmond 20830:1024> <cmond_cdb.c:600> No action 
required for this notification.
Thu Mar 20 11:49:40.870 <cmond 20830:1024> <cmond_cdb.c:114> CDB 
notification(s) arrived.
Thu Mar 20 11:49:40.870 <cmond 20830:1024> <cmond_cdb.c:186> Serving 
notification #0 key = #cluster#test#HA#ResourceGroups#webserver-group.
Thu Mar 20 11:49:40.870 <cmond 20830:1024> <cmond_cdb.c:600> No action 
required for this notification.