[LinuxFailSafe] srmd script error while bringing up IP
Nemeth Lorant
loci@crandon.sch.bme.hu
Wed, 8 May 2002 11:47:51 +0200 (CEST)
Hi!
After rebuilding the RPMs without stonith enabled the things changed a
little bit. It looks like STONITH wasn't configured properly and that just
hung up bringing up the resources (this is just a hypotesis). So, after
the rebuild:
I defined from scipts the following things:
2 nodes, 1 cluster, 2 policies, 2 IP_addresses, 2 resource_groups
After that the nodes and the cluster in INACTIVE state;
I run haActivate which returned:
HA services have been activated in cluster HA (HA is the name of the
cluster)
Node status turned into UP and cluster status into ACTIVE state.
Resource groups were in offline state with no errors.
(Resource group1 with the first IP should be on node1 (labpc1) and the
other resource group, with the other IP on the other node)
After that:
[root@labpc1 root]# resgroupQueryStatus _CLUSTER=HA _RESOURCE_GROUP=RG1
State: Online
Error: srmd executable error
Owner: labpc2.inf.mit.bme.hu
[root@labpc1 root]# resourceStatus _RESOURCE=152.66.242.65
_RESOURCE_TYPE=IP_address _CLUSTER=HA
State: Online Error: Resource
executable failure Owner: labpc2.inf.mit.bme.hu
Flags: Resource is not locally monitored
The logs:
--- cad_log ---
Wed May 8 11:03:42.268 <cam_svc 1431:1026> Received inventory for category
50 flags 1
Wed May 8 11:03:46.824 <cad 1438:7176> cfs_fs_connect: fs_cam_register
failed with error FailSafe is not ready to accept admin
requests.
Wed May 8 11:03:56.924 <cad 1438:7176> cfs_fs_connect: fs_cam_register
failed with error FailSafe is not ready to accept admin
requests.
Wed May 8 11:04:07.026 <cad 1438:7176> cfs_fs_connect: fs_cam_register
failed with error FailSafe is not ready to accept admin
requests.
Wed May 8 11:04:16.137 <cam_casmail 1439:8201> ccamail_cam_get_event()
called
Wed May 8 11:04:16.138 <cam_casmail
1439:8201> exiting ccamail_cam_get_event(): event = 0x812bc70
Wed May 8 11:20:36.179 <cam_casmail 1439:8201> ccamail_cam_get_event()
called
Wed May 8 11:20:36.180 <cam_casmail 1439:8201> exiting
ccamail_cam_get_event(): event = 0x80e6c80
Wed May 8 11:20:36.205 <cam_casmail 1439:8201> ccamail_cam_get_event()
called
--- ifd_labpc1 ---
* * * L o g g i n g R e s t a r t e d * * *
Wed May 8 11:03:12.028 <W ha_ifd ifd 2192:0 ifd_net.c:679> CI_FAILURE, lo
is neither broadcast, point-to-point, nor loopback
Wed May 8 11:03:12.029 <I0 ha_ifd ifd 2192:0 ifd_main.c:346> ha_ifd
monitoring network interfaces
Wed May 8 11:20:03.000 <W ha_ifd ifd 2192:0 ifd_net.c:848> CI_FAILURE,
More than 256 aliases for interface
Wed May 8 11:20:13.164 <I0 ha_ifd ifd 2192:0 ifd_main.c:751> CI_FAILURE,
get information ipaddress 152.66.242.65 failed
Wed May 8 11:38:27.269 <W ha_ifd ifd 2192:0 ifd_net.c:848> CI_FAILURE,
More than 256 aliases for interface
--- ha_ifd_labpc1.inf.mit.bme.hu ---
Wed May 8 11:19:52 <N
/usr/lib/failsafe/resource_types/IP_address/exclusive script 2473:0>
resource 152.66.242.65 exclusive status: NOT RUNNING
Wed May 8 11:20:03 <N /usr/lib/failsafe/resource_types/IP_address/start
script 2492:0> ip address 152.66.242.65 cannot be configured
Wed May 8 11:20:03 <N /usr/lib/failsafe/resource_types/IP_address/start
script 2492:0> Check ha_ifd logs on this node for more information
Wed May 8 11:37:17 <N
/usr/lib/failsafe/resource_types/IP_address/exclusive script 2635:0>
resource 152.66.242.66 exclusive status: NOT RUNNING
Wed May 8 11:38:27 <N /usr/lib/failsafe/resource_types/IP_address/start
script 2675:0> ip address 152.66.242.66 cannot be configured
Wed May 8 11:38:27 <N /usr/lib/failsafe/resource_types/IP_address/start
script 2675:0> Check ha_ifd logs on this node for more information
Any ideas why the IP start script fails? Is ther any method, how these
scripts can easily tested? (What are those 3 files that have to passed to
it as command line arguments?)
Other problem:
/etc/init.d/fs_cluster stop doesn't stop all processes
Thx:
Lorant Nemeth (Loci)