[LinuxFailSafe] 2 Newbie questions...

Luke Alexander luke.alexander@qick.com
02 Jul 2002 10:12:28 +0100


Hi All,

Just a couple of issues that someone is bound to be able to help me
with.

Have installed failsafe for linux, on a standard redhat 7.2 build using
the rpms from SGI, everything installed OK - did the --nodeps switch to
install the tcpmux rpm, and then created xinetd.d file to enable tcpmux
as a service, added the necessary lines to /etc/services - restarted
xinetd and then started the fs_cluster daemons, they all seemed to start
OK.

When trying to connect to GUI, it reports fatal error: Unable to
connect to CAM daemon on xxx, tried running fs_cluster restart but get
the same error, I have now also repeated the same installation process
on a Redhat 7.1 build (I was lead to believe that the rpms were built
for 7.1), and have different errors on starting the fs_cluster:

Starting Cluster Services:
Cluster Control processes could not be restarted

On the 7.1 install only two of the three log files have been created:

cdbd_log and cmond_log

On 7.2 all three logs have been created although I cannot find anything
to suggest an answer as to why the GUI is unable to connect.

It seems to me like a possible Catch 22 situation: the GUI won't start
without the config database being defined, and I want to use the GUI to
define the config database...

Any help much appreciated,

Thanks - 

here are some parts of the log files:

7.2 cad_log:

Thu Jun 27 13:09:30.967 <cam_cascdb 17600:3076> cdb key not found
#global#machines. cdb error 7
Thu Jun 27 13:09:30.973 <cam_cicdb 17601:4101> cdb key not found
#local#HA#resources. cdb error 10
Thu Jun 27 13:09:36.135 <cad 17604:7176> cfs_fs_connect: fs_cam_register
failed with error FailSafe is not ready to accept admin requests.
Thu Jun 27 13:09:36.216 <cad 17604:7176> cdb key not found #cluster. cdb
error 7


7.1 cdbd_log:

Mon Jul  1 11:39:07.474 qluster3 cdbd  - fs2d: couldn't register as a
TCP service
Mon Jul  1 11:39:07.475 qluster3 cdbd  - fs2d_init: cannot open
database, error 3
Mon Jul  1 11:39:07.475 qluster3 cdbd  - Initialization failed


7.1 cmond_log:

Mon Jul  1 11:17:37.863 <cmond 564:1024> Beginning reconfiguration.
Mon Jul  1 11:17:37.864 <cmond 564:1024> Reconfiguration done.
Mon Jul  1 11:17:38.004 <cmond 564:1024> Could not open configuration
database.
Mon Jul  1 11:17:38.004 <cmond 564:1024> New client request have
arrived.
Mon Jul  1 11:17:38.004 <cmond 564:1024> Serving request #1.
Mon Jul  1 11:17:38.004 <cmond 564:1024> Request = restart cluster_admin
REPLY|FORCE.
Mon Jul  1 11:17:38.004 <cmond 564:1024> Beginning restart cluster_admin
REPLY|FORCE.
Mon Jul  1 11:17:38.005 <cmond 564:1024> Killing cad:1243, sending
SIGTERM.
Mon Jul  1 11:17:38.109 <cmond 564:1024> Starting process cad.
Mon Jul  1 11:17:38.109 <cmond 564:1024> Going to fork/exec new process
"cad -l -lf /var/log/failsafe/cad_log --append_log".
Mon Jul  1 11:17:38.109 <cmond 564:1024> New process cad pid 2434
Mon Jul  1 11:17:38.124 <cmond 564:1024> Successfully finishing restart
cluster_admin REPLY|FORCE.
Mon Jul  1 11:17:38.124 <cmond 564:1024> Process group cluster_admin is
in running state.
Mon Jul  1 11:17:38.124 <cmond 564:1024> Request served successfully.
Mon Jul  1 11:17:38.124 <cmond 564:1024> Sending reply for request #1.
Mon Jul  1 11:17:38.124 <cmond 564:1024> 0 processes have exited.
Mon Jul  1 11:38:17.746 <cmond 2806:1024> Cmond restarted, using log
level info.
Mon Jul  1 11:38:17.849 <cmond 2806:1024> Creating process group table.
Mon Jul  1 11:38:17.849 <cmond 2806:1024> Enabling client requests.
Mon Jul  1 11:38:17.849 <cmond 2806:1024> Installing signal handlers.
Mon Jul  1 11:38:17.849 <cmond 2806:1024> Attempting cdb registration.
Mon Jul  1 11:38:17.978 <cmond 2806:1024> Could not open configuration
database.
Mon Jul  1 11:38:17.978 <cmond 2806:1024> Cdb registration failed, will
continueretyring until successful.
Mon Jul  1 11:38:17.978 <cmond 2806:1024> Initiating autoactions.
Mon Jul  1 11:38:17.979 <cmond 2806:1024> Reading configuration
information for process group cluster_admin.
Mon Jul  1 11:38:17.979 <cmond 2806:1024> Configuration for process
group cluster_admin.
Mon Jul  1 11:38:17.979 <cmond 2806:1024>         Type = cluster_admin
Mon Jul  1 11:38:17.979 <cmond 2806:1024>         Procs = cad 
Mon Jul  1 11:38:17.979 <cmond 2806:1024>         Actions = start stop
restart detach attach status    
Mon Jul  1 11:38:17.979 <cmond 2806:1024> Reading configuration
information for process group cluster_control.
Mon Jul  1 11:38:17.980 <cmond 2806:1024> Configuration for process
group cluster_control.
Mon Jul  1 11:38:17.980 <cmond 2806:1024>         Type = cluster_control
Mon Jul  1 11:38:17.980 <cmond 2806:1024>         Procs = crsd 
Mon Jul  1 11:38:17.980 <cmond 2806:1024>         Actions = start stop
restart detach attach status    
Mon Jul  1 11:38:17.980 <cmond 2806:1024> Reading configuration
information for process group cluster_hainfra.
Mon Jul  1 11:38:17.980 <cmond 2806:1024> Configuration for process
group cluster_hainfra.
Mon Jul  1 11:38:17.980 <cmond 2806:1024>         Type = cluster_hainfra
Mon Jul  1 11:38:17.980 <cmond 2806:1024>         Procs = ha_cmsd ha_gcd
ha_srmd 
Mon Jul  1 11:38:17.980 <cmond 2806:1024>         Actions = start stop
restart detach attach status    
Mon Jul  1 11:38:17.980 <cmond 2806:1024> Reading configuration
information for process group ip_addresses.
Mon Jul  1 11:38:17.981 <cmond 2806:1024> Configuration for process
group ip_addresses.
Mon Jul  1 11:38:17.981 <cmond 2806:1024>         Type = cluster_agent
Mon Jul  1 11:38:17.981 <cmond 2806:1024>         Procs = ha_ifd 
Mon Jul  1 11:38:17.981 <cmond 2806:1024>         Actions = start stop
restart detach attach status    
Mon Jul  1 11:38:17.981 <cmond 2806:1024> Reading configuration
information for process group cluster_failsafe.
Mon Jul  1 11:38:17.981 <cmond 2806:1024> Configuration for process
group cluster_failsafe.
Mon Jul  1 11:38:17.981 <cmond 2806:1024>         Type = cluster_ha
Mon Jul  1 11:38:17.981 <cmond 2806:1024>         Procs = ha_fsd 
Mon Jul  1 11:38:17.981 <cmond 2806:1024>         Actions = start stop
restart detach attach status    
Mon Jul  1 11:38:17.981 <cmond 2806:1024> Beginning autoaction
cluster_admin .
Mon Jul  1 11:38:17.981 <cmond 2806:1024> autoaction is start action.
Mon Jul  1 11:38:17.982 <cmond 2806:1024> Starting process cad.
Mon Jul  1 11:38:17.982 <cmond 2806:1024> Going to fork/exec new process
"cad -l -lf /var/log/failsafe/cad_log --append_log".
Mon Jul  1 11:38:17.983 <cmond 2806:1024> New process cad pid 2839
Mon Jul  1 11:38:17.997 <cmond 2806:1024> Successfully finishing
autoaction cluster_admin .
Mon Jul  1 11:38:17.997 <cmond 2806:1024> Beginning autoaction
cluster_control .
Mon Jul  1 11:38:17.997 <cmond 2806:1024> autoaction is attach action.
Mon Jul  1 11:38:17.997 <cmond 2806:1024> Looking for process crsd to
attach to.
Mon Jul  1 11:38:17.997 <cmond 2806:1024> autoaction cluster_control 
failed - could not access object.
Mon Jul  1 11:38:17.997 <cmond 2806:1024> Beginning autoaction
ip_addresses .
Mon Jul  1 11:38:17.997 <cmond 2806:1024> autoaction is attach action.
Mon Jul  1 11:38:17.997 <cmond 2806:1024> Looking for process ha_ifd to
attach to.
Mon Jul  1 11:38:17.997 <cmond 2806:1024> autoaction ip_addresses 
failed - could not access object.
Mon Jul  1 11:38:17.997 <cmond 2806:1024> Beginning autoaction
cluster_hainfra .
Mon Jul  1 11:38:17.997 <cmond 2806:1024> autoaction is attach action.
Mon Jul  1 11:38:17.997 <cmond 2806:1024> Looking for process ha_cmsd to
attach to.
Mon Jul  1 11:38:17.997 <cmond 2806:1024> Looking for process ha_gcd to
attach to.
Mon Jul  1 11:38:17.997 <cmond 2806:1024> Looking for process ha_srmd to
attach to.
Mon Jul  1 11:38:17.998 <cmond 2806:1024> autoaction cluster_hainfra 
failed - could not access object.
Mon Jul  1 11:38:17.998 <cmond 2806:1024> Beginning autoaction
cluster_failsafe .
Mon Jul  1 11:38:17.998 <cmond 2806:1024> autoaction is attach action.
Mon Jul  1 11:38:17.998 <cmond 2806:1024> Looking for process ha_fsd to
attach to.
Mon Jul  1 11:38:17.998 <cmond 2806:1024> autoaction cluster_failsafe 
failed - could not access object.
Mon Jul  1 11:38:17.998 <cmond 2806:1024> Autoactions done.
Mon Jul  1 11:38:27.989 <cmond 2806:1024> 0 processes have exited.
Mon Jul  1 11:38:28.125 <cmond 2806:1024> Could not open configuration
database.
Mon Jul  1 11:38:38.119 <cmond 2806:1024> 0 processes have exited.
Mon Jul  1 11:38:38.256 <cmond 2806:1024> Could not open configuration
database.
Mon Jul  1 11:38:48.249 <cmond 2806:1024> 0 processes have exited.
Mon Jul  1 11:38:48.388 <cmond 2806:1024> Could not open configuration
database.
Mon Jul  1 11:38:49.709 <cmond 2806:1024> Process with pid 2839 has
exited with status 256
Mon Jul  1 11:38:49.709 <cmond 2806:1024> 1 processes have exited.
Mon Jul  1 11:38:49.709 <cmond 2806:1024> Process cad:2839 of group
cluster_admin exited, status = 1.
Mon Jul  1 11:38:49.709 <cmond 2806:1024> Initiating recovery for
process group cluster_admin.
Mon Jul  1 11:38:51.719 <cmond 2806:1024> Starting process cad.
Mon Jul  1 11:38:51.719 <cmond 2806:1024> Going to fork/exec new process
"cad -l -lf /var/log/failsafe/cad_log --append_log".
Mon Jul  1 11:38:51.719 <cmond 2806:1024> New process cad pid 3132
Mon Jul  1 11:38:51.720 <cmond 2806:1024> Recovery for process group
cluster_admin complete.