[LinuxFailSafe] problem reseting node
Tabitha Taylor
tabtaylor@excite.com
Tue, 24 Sep 2002 11:29:33 -0400 (EDT)
Could anyone help me out with these crsd errors? I have no
idea what they mean.
I am using the latest failsafe build ver. 1.0.4 from oss.sgi.com
PACKAGE_NAME = FailSafe
PACKAGE_RELEASE = 1
PACKAGE_VERSION = 1.0.4
PACKAGE_DISTRIBUTION = SGI 1.0
I can reset with no problem using the stonith cmd through command-line.
rpc.cfg file contents:
/dev/ttyS0 HA2 0
using command: /usr/local/sbin/stonith -t rps10 HA2
resets the node HA2 every time.
Here is the crsd_HA1 log after trying to reset HA2 in failsafe:
Tue Sep 24 11:01:07.210
Reset request from sysctlrKill:1 (local) for HA2:2.
Tue Sep 24 11:01:07.245
Running pre-reset power fail test.
Tue Sep 24 11:01:07.246
Powerfail (node HA2) called with status = CRSRS_NORESPONSE:6,
last heartbeat = 13835050774238853472, reset request time = 4617690025151166504.
Tue Sep 24 11:01:07.246
Port type = stonith status = Enabled CRSMS_LIVE:3, command fail time = 0
last response time = 0.
Tue Sep 24 11:01:07.246
Min net delay = 250 max net delay = 500.
Tue Sep 24 11:01:07.246
Maximum safe network failure detection interval = 300000.
Tue Sep 24 11:01:07.246
Maximum safe serial line failure detection interval = 15000.
Tue Sep 24 11:01:07.247
Power fail algorithm returning reset status = CRSRS_NORESPONSE:6.
Tue Sep 24 11:01:07.247
Pre-reset power fail test failed, issuing real reset.
Tue Sep 24 11:01:17.300
RESET FAILED.
Tue Sep 24 11:01:17.300
Reset returned with err = CI_CRSERR_NOTFOUND.
Tue Sep 24 11:01:17.301
Reset failed, running post-reset power fail test.
Tue Sep 24 11:01:17.301
Powerfail (node HA2) called with status = CRSRS_NORESPONSE:6,
last heartbeat = 13835050774238853472,
reset request time = 4617690025151166504.
Tue Sep 24 11:01:17.301
Port type = stonith status = Enabled CRSMS_LIVE:3, command fail time = 0
last response time = 0.
Tue Sep 24 11:01:17.301
Min net delay = 250 max net delay = 500.
Tue Sep 24 11:01:17.301
Maximum safe network failure detection interval = 300000.
Tue Sep 24 11:01:17.301
Maximum safe serial line failure detection interval = 15000.
Tue Sep 24 11:01:17.302
Power fail algorithm returning reset status = CRSRS_NORESPONSE:6.
Tue Sep 24 11:01:17.302
Post-reset power fail test failed.
Any idea on what is going wrong?
Thanks,
Tabitha
------------------------------------------------
Changed your e-mail? Keep your contacts! Use this free e-mail change of address service from Return Path. Register now!