[LinuxFailSafe] 2 Newbie questions...

Luke Alexander luke.alexander@qick.com
02 Jul 2002 12:32:54 +0100


Yes the portmap daemon which I usually have disabled is running...

And would you believe it - I can now connect, I had a suspicion that it
may have had something to do with portmap, and tried this yesterday but
with no luck, I have literally just started it now, and fstask goes
straight in on the 7.2 box - it would seem that portmap needs to be
restarted after a restart of fs_cluster - is this true?

I've also now managed to do the same on the redhat 7.1 and on a new
redhat 7.3 box.

I also found that adding the 'LD_ASSUME_KERNEL=2.2.5' to the environment
running fstask, stopped the JRE from hanging and eating all CPU
resource...

Thanks for the assistance - if anyone else is new to linuxfailsafe this
is a list of steps that I took to get it working...


RPM install sequence as follows:

1)install IBM's JRE rpm
2)install sysadm_base-client.rpm
3)install sysadm_base-lib.rpm
4)install sysadm_base-dev*.rpm
5)install using --nodeps switch, sysadm_base-tcpmux.rpm
6)install install sysadm_base-server.rpm
7)install cluster_admin*rpm
8)install cluster_services*rpm
9)install failsafe*rpm
10)install sysadm-failsafe-client*rpm
11)install sysadm-failsafe-server*rpm
12)install sysadm-failsafe-web*rpm

Added LD_ASSUME_KERNEL=2.2.5 to the environment 

Add the following lines to /etc/services:
(For Redhat I needed to comment out another service that was running on
port 7000/udp)

sgi-cmsd	7000/udp		# Cluster membership services daemon
sgi-crsd	17002/udp		# Cluster reset services daemon
sgi-gcd		17003/udp		# SGI Group membership daemon
sgi-cad		17004/tcp		# Cluster Admin daemon

Restart xinetd

Run cdbreinit to stop the daemons, re-initialise the database, and then
restart the cluster daemons

Start or restart the portmap daemon


Thanks again.

Luke.




On Tue, 2002-07-02 at 10:35, Giulius wrote:
> Hi.
> 
> Have you the portmap daemon?
> 
> 
> 
> 
> Luke Alexander wrote:
> 
> >Hi All,
> >
> >Just a couple of issues that someone is bound to be able to help me
> >with.
> >
> >Have installed failsafe for linux, on a standard redhat 7.2 build using
> >the rpms from SGI, everything installed OK - did the --nodeps switch to
> >install the tcpmux rpm, and then created xinetd.d file to enable tcpmux
> >as a service, added the necessary lines to /etc/services - restarted
> >xinetd and then started the fs_cluster daemons, they all seemed to start
> >OK.
> >
> >When trying to connect to GUI, it reports fatal error: Unable to
> >connect to CAM daemon on xxx, tried running fs_cluster restart but get
> >the same error, I have now also repeated the same installation process
> >on a Redhat 7.1 build (I was lead to believe that the rpms were built
> >for 7.1), and have different errors on starting the fs_cluster:
> >
> >Starting Cluster Services:
> >Cluster Control processes could not be restarted
> >
> >On the 7.1 install only two of the three log files have been created:
> >
> >cdbd_log and cmond_log
> >
> >On 7.2 all three logs have been created although I cannot find anything
> >to suggest an answer as to why the GUI is unable to connect.
> >
> >It seems to me like a possible Catch 22 situation: the GUI won't start
> >without the config database being defined, and I want to use the GUI to
> >define the config database...
> >
> >Any help much appreciated,
> >
> >Thanks - 
> >
> >here are some parts of the log files:
> >
> >7.2 cad_log:
> >
> >Thu Jun 27 13:09:30.967 <cam_cascdb 17600:3076> cdb key not found
> >#global#machines. cdb error 7
> >Thu Jun 27 13:09:30.973 <cam_cicdb 17601:4101> cdb key not found
> >#local#HA#resources. cdb error 10
> >Thu Jun 27 13:09:36.135 <cad 17604:7176> cfs_fs_connect: fs_cam_register
> >failed with error FailSafe is not ready to accept admin requests.
> >Thu Jun 27 13:09:36.216 <cad 17604:7176> cdb key not found #cluster. cdb
> >error 7
> >
> >
> >7.1 cdbd_log:
> >
> >Mon Jul  1 11:39:07.474 qluster3 cdbd  - fs2d: couldn't register as a
> >TCP service
> >Mon Jul  1 11:39:07.475 qluster3 cdbd  - fs2d_init: cannot open
> >database, error 3
> >Mon Jul  1 11:39:07.475 qluster3 cdbd  - Initialization failed
> >
> >
> >7.1 cmond_log:
> >
> >Mon Jul  1 11:17:37.863 <cmond 564:1024> Beginning reconfiguration.
> >Mon Jul  1 11:17:37.864 <cmond 564:1024> Reconfiguration done.
> >Mon Jul  1 11:17:38.004 <cmond 564:1024> Could not open configuration
> >database.
> >Mon Jul  1 11:17:38.004 <cmond 564:1024> New client request have
> >arrived.
> >Mon Jul  1 11:17:38.004 <cmond 564:1024> Serving request #1.
> >Mon Jul  1 11:17:38.004 <cmond 564:1024> Request = restart cluster_admin
> >REPLY|FORCE.
> >Mon Jul  1 11:17:38.004 <cmond 564:1024> Beginning restart cluster_admin
> >REPLY|FORCE.
> >Mon Jul  1 11:17:38.005 <cmond 564:1024> Killing cad:1243, sending
> >SIGTERM.
> >Mon Jul  1 11:17:38.109 <cmond 564:1024> Starting process cad.
> >Mon Jul  1 11:17:38.109 <cmond 564:1024> Going to fork/exec new process
> >"cad -l -lf /var/log/failsafe/cad_log --append_log".
> >Mon Jul  1 11:17:38.109 <cmond 564:1024> New process cad pid 2434
> >Mon Jul  1 11:17:38.124 <cmond 564:1024> Successfully finishing restart
> >cluster_admin REPLY|FORCE.
> >Mon Jul  1 11:17:38.124 <cmond 564:1024> Process group cluster_admin is
> >in running state.
> >Mon Jul  1 11:17:38.124 <cmond 564:1024> Request served successfully.
> >Mon Jul  1 11:17:38.124 <cmond 564:1024> Sending reply for request #1.
> >Mon Jul  1 11:17:38.124 <cmond 564:1024> 0 processes have exited.
> >Mon Jul  1 11:38:17.746 <cmond 2806:1024> Cmond restarted, using log
> >level info.
> >Mon Jul  1 11:38:17.849 <cmond 2806:1024> Creating process group table.
> >Mon Jul  1 11:38:17.849 <cmond 2806:1024> Enabling client requests.
> >Mon Jul  1 11:38:17.849 <cmond 2806:1024> Installing signal handlers.
> >Mon Jul  1 11:38:17.849 <cmond 2806:1024> Attempting cdb registration.
> >Mon Jul  1 11:38:17.978 <cmond 2806:1024> Could not open configuration
> >database.
> >Mon Jul  1 11:38:17.978 <cmond 2806:1024> Cdb registration failed, will
> >continueretyring until successful.
> >Mon Jul  1 11:38:17.978 <cmond 2806:1024> Initiating autoactions.
> >Mon Jul  1 11:38:17.979 <cmond 2806:1024> Reading configuration
> >information for process group cluster_admin.
> >Mon Jul  1 11:38:17.979 <cmond 2806:1024> Configuration for process
> >group cluster_admin.
> >Mon Jul  1 11:38:17.979 <cmond 2806:1024>         Type = cluster_admin
> >Mon Jul  1 11:38:17.979 <cmond 2806:1024>         Procs = cad 
> >Mon Jul  1 11:38:17.979 <cmond 2806:1024>         Actions = start stop
> >restart detach attach status    
> >Mon Jul  1 11:38:17.979 <cmond 2806:1024> Reading configuration
> >information for process group cluster_control.
> >Mon Jul  1 11:38:17.980 <cmond 2806:1024> Configuration for process
> >group cluster_control.
> >Mon Jul  1 11:38:17.980 <cmond 2806:1024>         Type = cluster_control
> >Mon Jul  1 11:38:17.980 <cmond 2806:1024>         Procs = crsd 
> >Mon Jul  1 11:38:17.980 <cmond 2806:1024>         Actions = start stop
> >restart detach attach status    
> >Mon Jul  1 11:38:17.980 <cmond 2806:1024> Reading configuration
> >information for process group cluster_hainfra.
> >Mon Jul  1 11:38:17.980 <cmond 2806:1024> Configuration for process
> >group cluster_hainfra.
> >Mon Jul  1 11:38:17.980 <cmond 2806:1024>         Type = cluster_hainfra
> >Mon Jul  1 11:38:17.980 <cmond 2806:1024>         Procs = ha_cmsd ha_gcd
> >ha_srmd 
> >Mon Jul  1 11:38:17.980 <cmond 2806:1024>         Actions = start stop
> >restart detach attach status    
> >Mon Jul  1 11:38:17.980 <cmond 2806:1024> Reading configuration
> >information for process group ip_addresses.
> >Mon Jul  1 11:38:17.981 <cmond 2806:1024> Configuration for process
> >group ip_addresses.
> >Mon Jul  1 11:38:17.981 <cmond 2806:1024>         Type = cluster_agent
> >Mon Jul  1 11:38:17.981 <cmond 2806:1024>         Procs = ha_ifd 
> >Mon Jul  1 11:38:17.981 <cmond 2806:1024>         Actions = start stop
> >restart detach attach status    
> >Mon Jul  1 11:38:17.981 <cmond 2806:1024> Reading configuration
> >information for process group cluster_failsafe.
> >Mon Jul  1 11:38:17.981 <cmond 2806:1024> Configuration for process
> >group cluster_failsafe.
> >Mon Jul  1 11:38:17.981 <cmond 2806:1024>         Type = cluster_ha
> >Mon Jul  1 11:38:17.981 <cmond 2806:1024>         Procs = ha_fsd 
> >Mon Jul  1 11:38:17.981 <cmond 2806:1024>         Actions = start stop
> >restart detach attach status    
> >Mon Jul  1 11:38:17.981 <cmond 2806:1024> Beginning autoaction
> >cluster_admin .
> >Mon Jul  1 11:38:17.981 <cmond 2806:1024> autoaction is start action.
> >Mon Jul  1 11:38:17.982 <cmond 2806:1024> Starting process cad.
> >Mon Jul  1 11:38:17.982 <cmond 2806:1024> Going to fork/exec new process
> >"cad -l -lf /var/log/failsafe/cad_log --append_log".
> >Mon Jul  1 11:38:17.983 <cmond 2806:1024> New process cad pid 2839
> >Mon Jul  1 11:38:17.997 <cmond 2806:1024> Successfully finishing
> >autoaction cluster_admin .
> >Mon Jul  1 11:38:17.997 <cmond 2806:1024> Beginning autoaction
> >cluster_control .
> >Mon Jul  1 11:38:17.997 <cmond 2806:1024> autoaction is attach action.
> >Mon Jul  1 11:38:17.997 <cmond 2806:1024> Looking for process crsd to
> >attach to.
> >Mon Jul  1 11:38:17.997 <cmond 2806:1024> autoaction cluster_control 
> >failed - could not access object.
> >Mon Jul  1 11:38:17.997 <cmond 2806:1024> Beginning autoaction
> >ip_addresses .
> >Mon Jul  1 11:38:17.997 <cmond 2806:1024> autoaction is attach action.
> >Mon Jul  1 11:38:17.997 <cmond 2806:1024> Looking for process ha_ifd to
> >attach to.
> >Mon Jul  1 11:38:17.997 <cmond 2806:1024> autoaction ip_addresses 
> >failed - could not access object.
> >Mon Jul  1 11:38:17.997 <cmond 2806:1024> Beginning autoaction
> >cluster_hainfra .
> >Mon Jul  1 11:38:17.997 <cmond 2806:1024> autoaction is attach action.
> >Mon Jul  1 11:38:17.997 <cmond 2806:1024> Looking for process ha_cmsd to
> >attach to.
> >Mon Jul  1 11:38:17.997 <cmond 2806:1024> Looking for process ha_gcd to
> >attach to.
> >Mon Jul  1 11:38:17.997 <cmond 2806:1024> Looking for process ha_srmd to
> >attach to.
> >Mon Jul  1 11:38:17.998 <cmond 2806:1024> autoaction cluster_hainfra 
> >failed - could not access object.
> >Mon Jul  1 11:38:17.998 <cmond 2806:1024> Beginning autoaction
> >cluster_failsafe .
> >Mon Jul  1 11:38:17.998 <cmond 2806:1024> autoaction is attach action.
> >Mon Jul  1 11:38:17.998 <cmond 2806:1024> Looking for process ha_fsd to
> >attach to.
> >Mon Jul  1 11:38:17.998 <cmond 2806:1024> autoaction cluster_failsafe 
> >failed - could not access object.
> >Mon Jul  1 11:38:17.998 <cmond 2806:1024> Autoactions done.
> >Mon Jul  1 11:38:27.989 <cmond 2806:1024> 0 processes have exited.
> >Mon Jul  1 11:38:28.125 <cmond 2806:1024> Could not open configuration
> >database.
> >Mon Jul  1 11:38:38.119 <cmond 2806:1024> 0 processes have exited.
> >Mon Jul  1 11:38:38.256 <cmond 2806:1024> Could not open configuration
> >database.
> >Mon Jul  1 11:38:48.249 <cmond 2806:1024> 0 processes have exited.
> >Mon Jul  1 11:38:48.388 <cmond 2806:1024> Could not open configuration
> >database.
> >Mon Jul  1 11:38:49.709 <cmond 2806:1024> Process with pid 2839 has
> >exited with status 256
> >Mon Jul  1 11:38:49.709 <cmond 2806:1024> 1 processes have exited.
> >Mon Jul  1 11:38:49.709 <cmond 2806:1024> Process cad:2839 of group
> >cluster_admin exited, status = 1.
> >Mon Jul  1 11:38:49.709 <cmond 2806:1024> Initiating recovery for
> >process group cluster_admin.
> >Mon Jul  1 11:38:51.719 <cmond 2806:1024> Starting process cad.
> >Mon Jul  1 11:38:51.719 <cmond 2806:1024> Going to fork/exec new process
> >"cad -l -lf /var/log/failsafe/cad_log --append_log".
> >Mon Jul  1 11:38:51.719 <cmond 2806:1024> New process cad pid 3132
> >Mon Jul  1 11:38:51.720 <cmond 2806:1024> Recovery for process group
> >cluster_admin complete.
> >
> >
> >
> >_______________________________________________
> >LinuxFailSafe mailing list
> >LinuxFailSafe@lists.community.tummy.com
> >http://lists.community.tummy.com/mailman/listinfo/linuxfailsafe
> >
> 
> 
> 
> _______________________________________________
> LinuxFailSafe mailing list
> LinuxFailSafe@lists.community.tummy.com
> http://lists.community.tummy.com/mailman/listinfo/linuxfailsafe