[LinuxFailSafe] srmd script error while bringing up IP

Nemeth Lorant loci@crandon.sch.bme.hu
Wed, 8 May 2002 11:47:51 +0200 (CEST)


Hi!

After rebuilding the RPMs without stonith enabled the things changed a 
little bit. It looks like STONITH wasn't configured properly and that just 
hung up bringing up the resources (this is just a hypotesis). So, after 
the rebuild:

I defined from scipts the following things:
2 nodes, 1 cluster, 2 policies, 2 IP_addresses, 2 resource_groups

After that the nodes and the cluster in INACTIVE state;
I run haActivate which returned:

HA services have been activated in cluster HA (HA is the name of the 
cluster)

Node status turned into UP and cluster status into ACTIVE state.
Resource groups were in offline state with no errors.
(Resource group1 with the first IP should be on node1 (labpc1) and the 
other resource group, with the other IP on the other node)

After that:
[root@labpc1 root]# resgroupQueryStatus _CLUSTER=HA _RESOURCE_GROUP=RG1
State: Online
Error: srmd executable error
Owner: labpc2.inf.mit.bme.hu

[root@labpc1 root]# resourceStatus _RESOURCE=152.66.242.65
_RESOURCE_TYPE=IP_address _CLUSTER=HA
State: Online Error: Resource
executable failure Owner: labpc2.inf.mit.bme.hu
Flags: Resource is not locally monitored


The logs:

--- cad_log ---

Wed May 8 11:03:42.268 <cam_svc 1431:1026> Received inventory for category
50 flags 1 
Wed May 8 11:03:46.824 <cad 1438:7176> cfs_fs_connect: fs_cam_register 
failed with error FailSafe is not ready to accept admin
requests.  
Wed May 8 11:03:56.924 <cad 1438:7176> cfs_fs_connect: fs_cam_register 
failed with error FailSafe is not ready to accept admin
requests.  
Wed May 8 11:04:07.026 <cad 1438:7176> cfs_fs_connect: fs_cam_register 
failed with error FailSafe is not ready to accept admin
requests.  
Wed May 8 11:04:16.137 <cam_casmail 1439:8201> ccamail_cam_get_event()  
called 
Wed May 8 11:04:16.138 <cam_casmail
1439:8201> exiting ccamail_cam_get_event(): event = 0x812bc70 
Wed May 8 11:20:36.179 <cam_casmail 1439:8201> ccamail_cam_get_event()  
called 
Wed May 8 11:20:36.180 <cam_casmail 1439:8201> exiting
ccamail_cam_get_event(): event = 0x80e6c80 
Wed May 8 11:20:36.205 <cam_casmail 1439:8201> ccamail_cam_get_event()  
called

--- ifd_labpc1 ---

                * * * L o g g i n g R e s t a r t e d * * * 
Wed May 8 11:03:12.028 <W ha_ifd ifd 2192:0 ifd_net.c:679> CI_FAILURE, lo 
is neither broadcast, point-to-point, nor loopback
Wed May 8 11:03:12.029 <I0 ha_ifd ifd 2192:0 ifd_main.c:346> ha_ifd 
monitoring network interfaces
Wed May 8 11:20:03.000 <W ha_ifd ifd 2192:0 ifd_net.c:848> CI_FAILURE, 
More than 256 aliases for interface
Wed May 8 11:20:13.164 <I0 ha_ifd ifd 2192:0 ifd_main.c:751> CI_FAILURE, 
get information ipaddress 152.66.242.65 failed
Wed May 8 11:38:27.269 <W ha_ifd ifd 2192:0 ifd_net.c:848> CI_FAILURE,
More than 256 aliases for interface

--- ha_ifd_labpc1.inf.mit.bme.hu ---

Wed May 8 11:19:52 <N
/usr/lib/failsafe/resource_types/IP_address/exclusive script 2473:0>
resource 152.66.242.65 exclusive status: NOT RUNNING
Wed May 8 11:20:03 <N /usr/lib/failsafe/resource_types/IP_address/start 
script 2492:0> ip address 152.66.242.65 cannot be configured
Wed May 8 11:20:03 <N /usr/lib/failsafe/resource_types/IP_address/start 
script 2492:0> Check ha_ifd logs on this node for more information
Wed May 8 11:37:17 <N 
/usr/lib/failsafe/resource_types/IP_address/exclusive script 2635:0>
resource 152.66.242.66 exclusive status: NOT RUNNING 
Wed May 8 11:38:27 <N /usr/lib/failsafe/resource_types/IP_address/start 
script 2675:0> ip address 152.66.242.66 cannot be configured
Wed May 8 11:38:27 <N /usr/lib/failsafe/resource_types/IP_address/start 
script 2675:0> Check ha_ifd logs on this node for more information


Any ideas why the IP start script fails? Is ther any method, how these 
scripts can easily tested? (What are those 3 files that have to passed to 
it as command line arguments?)

Other problem:

/etc/init.d/fs_cluster stop doesn't stop all processes




				Thx:

						Lorant Nemeth (Loci)