[LinuxFailSafe] Linux Failsafe Problem on Poweroff

Bráulio Gergull gergull@getnet.com.br
Thu, 19 Dec 2002 09:51:52 -0300


---------MDLINK-NGMime-5773-1040302312.744128-------
content-type: text/plain; charset="iso-8859-1"
content-transfer-encoding: quoted-printable
content-length: 913

Hi,

I'm having some problems in an implementation of the SGI FailSafe in a SLES7 
and wonder if you could help.

The implementation is a basic 2 nodes cluster w/ 2 NIC's each.

The resource group have been created and I can transfer it from one machine 
to the other and bring it back again without any problems. Also, if I shut one 
machine down (init 0) the resource group is automaticaly moved to the other 
node.

The problem happens when I simulate a poweroff condition. It seems that the 
surviving node detects that the other node went down, however the resource 
group is not tranfered. According to the logs it seems that the database gets 
locked and does not accept administrative commands and the broken node still 
appears with state "UP" in the GUI.

Attached you'll find an excerpt of the log files.

Hope that you can help me.

Kindest wishes,



-- 
Br=E1ulio W. Gergull
GetNet - SuSE Brasil

---------MDLINK-NGMime-5773-1040302312.744128-------
content-type: text/plain
content-disposition: inline; filename="failsafelogs.txt"
content-transfer-encoding: 7bit
content-length: 33022

###############################################################
## cdbd_log
###############################################################

Thu Dec 19 09:34:31.774 db1 cdbd  - Null RPC (heartbeat) to db2: error : RPC: Timed out

Thu Dec 19 09:34:31.774 db1 cdbd  - Pool machine db2 (2) not responding
Thu Dec 19 09:34:31.774 db1 cdbd  - Need new quorum: machine 2 (db2) dropped out
Thu Dec 19 09:34:31.774 db1 cdbd  - Need replacement for old quorum: cluster id: 0x00000000.0x3dff738e233bb6e0, master: 1, sequence: 54, member count: 2, members:  2,  1
Thu Dec 19 09:34:31.774 db1 cdbd  - Proposed new quorum: cluster id: 0x00000000.0x3dff738e233bb6e0, master: 1, sequence: 55, member count: 1, members:  1
Thu Dec 19 09:34:31.816 db1 cdbd  - New quorum: cluster id: 0x00000000.0x3dff738e233bb6e0, master: 1, sequence: 55, member count: 1, members:  1
Thu Dec 19 09:34:31.816 db1 cdbd  - New quorum: cluster id: 0x00000000.0x3dff738e233bb6e0, master: 1, sequence: 55, member count: 1, members:  1
Thu Dec 19 09:34:31.816 db1 cdbd  - Ready and valid new quorum: cluster id: 0x00000000.0x3dff738e233bb6e0, master: 1, sequence: 55, member count: 1, members:  1
Thu Dec 19 09:35:48.854 db1 cdbd  - Machine db2.gilat.com.br has stopped polling.


###############################################################
## failsafe_db1
###############################################################

Thu Dec 19 09:35:50.324 <W ha_fsd gcs 15685:0 gcc.c:668> CI_IPCERR_NOCONN, gcs_pulse(): Pulse for instid = 0x10001 failed with error = CI_IPCERR_NOCONN
Thu Dec 19 09:35:50.324 <E ha_fsd fsd 15685:0 fs_misc.c:405> CI_IPCERR_NOCONN, Failed to pulse, GCD probably died
Thu Dec 19 09:35:50.324 <E ha_fsd fsd 15685:0 fs_misc.c:411> CI_IPCERR_NOCONN, (curr_time: 55145230) (passed_time: 0) (fs_time2pulse: 9970) (fs_globaltime: 0) (time2pulse: 30) (fs_pulsetime: 0) (real

                * * * L o g g i n g    R e s t a r t e d * * *

Thu Dec 19 09:35:54.431 <N ha_fsd fsd 16802:0 fs_main.c:178> /usr/lib/failsafe/bin/ha_fsd is running as foreground process
Thu Dec 19 09:35:54.432 <D0 ha_fsd fsd 16802:0 fs_main.c:468> Successfully marked this FSD as a realtime process
Thu Dec 19 09:35:54.438 <D0 ha_fsd fsd 16802:0 fs_config.c:900> Registering for notification
Thu Dec 19 09:35:54.438 <D0 ha_fsd fsd 16802:0 fs_config.c:917>   FOP delete msgs: #global#HA#FailoverPolicies
Thu Dec 19 09:35:54.438 <D0 ha_fsd fsd 16802:0 fs_config.c:923>   FOP update msgs: #global#HA#FailoverPolicies
Thu Dec 19 09:35:54.438 <D0 ha_fsd fsd 16802:0 fs_config.c:942>   cluster/node delete msgs: #cluster#db#machines
Thu Dec 19 09:35:54.438 <D0 ha_fsd fsd 16802:0 fs_config.c:948>   cluster/node update msgs: #cluster#db#machines
Thu Dec 19 09:35:54.439 <D0 ha_fsd fsd 16802:0 fs_config.c:965>   log params update msgs: #cluster#db#logging
Thu Dec 19 09:35:54.439 <D0 ha_fsd fsd 16802:0 fs_config.c:984>   Timeouts update msgs: #cluster#db#HA#services#failsafe
Thu Dec 19 09:35:54.440 <D1 ha_fsd fsd 16802:0 fs_config.c:188> BootSleep: 30000
Thu Dec 19 09:35:54.440 <D1 ha_fsd fsd 16802:0 fs_config.c:212> PulseTime: 10000
Thu Dec 19 09:35:54.441 <D1 ha_fsd fsd 16802:0 fs_config.c:239> Initial PulseTime: 20000
Thu Dec 19 09:35:54.441 <D1 ha_fsd fsd 16802:0 fs_config.c:264> GCS_GroupID: 11
Thu Dec 19 09:35:54.441 <D1 ha_fsd fsd 16802:0 fs_config.c:289> GCS_MessageSize: 64000
Thu Dec 19 09:35:54.441 <D1 ha_fsd fsd 16802:0 fs_config.c:314> ResetTimeout: 15
Thu Dec 19 09:35:54.441 <D0 ha_fsd config 16802:0 ci_cdbconfig_nodes.c:144> (Re)reading configuration.
Thu Dec 19 09:35:54.442 <D0 ha_fsd config 16802:0 ci_cdbconfig_nodes.c:282> Reading network interfaces for machine db1.
Thu Dec 19 09:35:54.443 <D0 ha_fsd config 16802:0 ci_cdbconfig_nodes.c:401>     Interface[0] = net0, IP address 192.168.2.103, priority 1.
Thu Dec 19 09:35:54.444 <D0 ha_fsd config 16802:0 ci_cdbconfig_nodes.c:341> Ignoring interface net1 formachine db1, not a heartbeat interface.
Thu Dec 19 09:35:54.444 <D0 ha_fsd config 16802:0 ci_cdbconfig_nodes.c:411> Finished reading 1 interfaces for node db1.
Thu Dec 19 09:35:54.444 <D0 ha_fsd config 16802:0 ci_cdbconfig_nodes.c:282> Reading network interfaces for machine db2.
Thu Dec 19 09:35:54.445 <D0 ha_fsd config 16802:0 ci_cdbconfig_nodes.c:401>     Interface[0] = net0, IP address 192.168.2.104, priority 1.
Thu Dec 19 09:35:54.445 <D0 ha_fsd config 16802:0 ci_cdbconfig_nodes.c:341> Ignoring interface net1 formachine db2, not a heartbeat interface.
Thu Dec 19 09:35:54.445 <D0 ha_fsd config 16802:0 ci_cdbconfig_nodes.c:411> Finished reading 1 interfaces for node db2.
Thu Dec 19 09:35:54.445 <D0 ha_fsd config 16802:0 ci_cdbconfig_nodes.c:218> Configuration read successful.
Thu Dec 19 09:35:54.450 <W ha_fsd fsd 16802:0 fs_resources.c:186>
Resource Group: db
   bind status (if error): success
   bind state: Undefined
   owner: ERROR (id: 0xfffffffe)
   new owner: ERROR (id: 0xfffffffe)
   flags: Controlled_Failback InPlace_Recovery enable
   policy: db_fail
     version: 1
     script: ordered
     attr(0): Controlled_Failback
     attr(1): InPlace_Recovery
     AFD(0): db1
     AFD(1): db2
   res state: Uninitialized
   res error: No error
   res action: SRMACT_RELEASE
   res flags: 0x0
   send update: NO
Thu Dec 19 09:35:54.450 <W ha_fsd fsd 16802:0 fs_resources.c:2462>
        Resource: 192.168.0.2  (type: IP_address)
        Resource: /ArrayB  (type: Filesystem)
        Resource: pgsql_loc  (type: Generic)
        Resource: ArrayB  (type: ServeRaid)

Thu Dec 19 09:35:54.450 <D0 ha_fsd fsd 16802:0 fs_main.c:228> Initial sleep time 60 seconds
Thu Dec 19 09:36:54.984 <D1 ha_fsd fsd 16802:0 fs_crs.c:82> Waiting to register with CRS
Thu Dec 19 09:36:55.994 <D0 ha_fsd crs 16802:0 crsl_register.c:254> Registration successful.
Thu Dec 19 09:36:55.994 <W ha_fsd fsd 16802:0 fs_crs.c:93> CRS Registration successful
Thu Dec 19 09:36:55.994 <W ha_fsd ipc 16802:0 ipc_clnt.c:295> CI_IPCERR_NOSERVER, Connection file /var/run/failsafe/comm/srm_db1 not present.
Thu Dec 19 09:36:55.995 <D0 ha_fsd fsd 16802:0 fs_srm.c:191> Waiting for good handle from SRM register
Thu Dec 19 09:37:55.584 <N ha_fsd fsd 16802:0 fs_srm.c:187> LAST MESSAGE IN THE fsd SUBSYSTEM REPEATED 58 TIMES
Thu Dec 19 09:37:55.585 <E ha_fsd fsd 16802:0 fs_srm.c:187> CI_IPCERR_NOSERVER, Waiting 1 minutes for SRM (ha_srmd) process
Thu Dec 19 09:37:56.594 <D0 ha_fsd fsd 16802:0 fs_srm.c:191> Waiting for good handle from SRM register
Thu Dec 19 09:38:56.184 <N ha_fsd fsd 16802:0 fs_srm.c:187> LAST MESSAGE IN THE fsd SUBSYSTEM REPEATED 58 TIMES
Thu Dec 19 09:38:56.184 <E ha_fsd fsd 16802:0 fs_srm.c:187> CI_IPCERR_NOSERVER, Waiting 2 minutes for SRM (ha_srmd) process
Thu Dec 19 09:38:57.194 <D0 ha_fsd fsd 16802:0 fs_srm.c:191> Waiting for good handle from SRM register
Thu Dec 19 09:40:57.384 <N ha_fsd fsd 16802:0 fs_srm.c:187> LAST MESSAGE IN THE fsd SUBSYSTEM REPEATED 118 TIMES
Thu Dec 19 09:40:57.385 <E ha_fsd fsd 16802:0 fs_srm.c:187> CI_IPCERR_NOSERVER, Waiting 4 minutes for SRM (ha_srmd) process
Thu Dec 19 09:40:58.394 <D0 ha_fsd fsd 16802:0 fs_srm.c:191> Waiting for good handle from SRM register
Thu Dec 19 09:44:59.784 <N ha_fsd fsd 16802:0 fs_srm.c:187> LAST MESSAGE IN THE fsd SUBSYSTEM REPEATED 238 TIMES
Thu Dec 19 09:44:59.784 <E ha_fsd fsd 16802:0 fs_srm.c:187> CI_IPCERR_NOSERVER, Waiting 8 minutes for SRM (ha_srmd) process
Thu Dec 19 09:45:00.794 <D0 ha_fsd fsd 16802:0 fs_srm.c:191> Waiting for good handle from SRM register
Thu Dec 19 09:46:18.064 <D0 ha_fsd fsd 16802:0 fs_srm.c:204> LAST MESSAGE IN THE fsd SUBSYSTEM REPEATED 75 TIMES


###############################################################
## cad_log
###############################################################

Thu Dec 19 09:34:11.311 <cam_casmail 600:1024> Notification cmd exited with 32512.
Thu Dec 19 09:34:41.311 <cam_casmail 600:1024> Notification cmd exited with 32512.
Thu Dec 19 09:35:11.311 <cam_casmail 600:1024> Notification cmd exited with 32512.
Thu Dec 19 09:35:41.316 <cam_cms 680:6151> ccms_poll_cmsd: cms_poll failed with error CI_CMSERR_LONELY
Thu Dec 19 09:35:41.320 <cam_casmail 600:1024> Notification cmd exited with 32512.
Thu Dec 19 09:35:50.328 <cad 675:1026> cas_msg_write<TRANSPORT>: transport currently in use to write previous message
Thu Dec 19 09:35:50.329 <cam_svc 675:1026> error 3 sending event notification to client 0x0000000240520bf0
Thu Dec 19 09:35:50.329 <cad 675:1026> cas_msg_write<TRANSPORT>: transport currently in use to write previous message
Thu Dec 19 09:35:50.329 <cam_svc 675:1026> error 3 sending event notification to client 0x0000000a4051fba0
Thu Dec 19 09:35:50.329 <cam_casmail 682:8201> ccamail_cam_get_event() called
Thu Dec 19 09:35:50.329 <cad 675:1026> cas_msg_write<TRANSPORT>: transport currently in use to write previous message
Thu Dec 19 09:35:50.329 <cam_svc 675:1026> error 3 sending event notification to client 0x000000114051a278
Thu Dec 19 09:35:50.329 <cam_casmail 682:8201> exiting ccamail_cam_get_event(): event = 0x405844a8
Thu Dec 19 09:35:51.314 <cam_srm 676:2051> csrm_poll_srmd: srm_poll failed with error CI_IPCERR_NOCONN
Thu Dec 19 09:35:51.316 <cad 681:7176> cfs_fs_connect: fs_cam_register failed with error FailSafe is not ready to accept admin requests.
Thu Dec 19 09:35:51.317 <cam_svc 675:1026> Received inventory for category 50 flags 1
Thu Dec 19 09:35:51.317 <cad 675:1026> cas_msg_write<TRANSPORT>: transport currently in use to write previous message
Thu Dec 19 09:35:51.317 <cam_svc 675:1026> error 3 sending event notification to client 0x0000000340521050
Thu Dec 19 09:35:51.317 <cad 675:1026> cas_msg_write<TRANSPORT>: transport currently in use to write previous message
Thu Dec 19 09:35:51.317 <cam_svc 675:1026> error 3 sending event notification to client 0x0000000b4051b4b8
Thu Dec 19 09:35:51.317 <cad 675:1026> cas_msg_write<TRANSPORT>: transport currently in use to write previous message
Thu Dec 19 09:35:51.317 <cam_svc 675:1026> error 3 sending event notification to client 0x0000001240504770
Thu Dec 19 09:35:51.317 <cam_srm 676:2051> csrm_srm_shutdown: srm_unregister failed with error CI_IPCERR_NOCONN
Thu Dec 19 09:36:01.315 <cad 681:7176> cfs_fs_connect: fs_cam_register failed with error FailSafe is not ready to accept admin requests.
Thu Dec 19 09:36:11.314 <cad 681:7176> cfs_fs_connect: fs_cam_register failed with error FailSafe is not ready to accept admin requests.
Thu Dec 19 09:36:11.321 <cam_casmail 600:1024> Notification cmd exited with 32512.
Thu Dec 19 09:36:21.324 <cad 681:7176> cfs_fs_connect: fs_cam_register failed with error FailSafe is not ready to accept admin requests.
Thu Dec 19 09:36:31.334 <cad 681:7176> cfs_fs_connect: fs_cam_register failed with error FailSafe is not ready to accept admin requests.
Thu Dec 19 09:36:41.337 <cad 681:7176> cfs_fs_connect: fs_cam_register failed with error FailSafe is not ready to accept admin requests.
Thu Dec 19 09:36:41.341 <cam_casmail 600:1024> Notification cmd exited with 32512.


###############################################################
## cmond_log
###############################################################

Thu Dec 19 09:35:41.235 <cmond 591:1024> <cmond_sig.c:271> Process with pid 15746 has exited with status 256
Thu Dec 19 09:35:41.235 <cmond 591:1024> <cmond_sig.c:275> 1 processes have exited.
Thu Dec 19 09:35:41.235 <cmond 591:1024> <cmond_pg.c:687> Process ha_gcd:15746 of group cluster_hainfra exited, status = 1.
Thu Dec 19 09:35:41.235 <cmond 591:1024> <cmond_pg.c:702> Initiating recovery for process group cluster_hainfra.
Thu Dec 19 09:35:43.244 <cmond 591:1024> <cmond_proc.c:178> Starting process ha_gcd.
Thu Dec 19 09:35:43.244 <cmond 591:1024> <cmond_proc.c:98> Going to fork/exec new process "ha_gcd -l ".
Thu Dec 19 09:35:43.244 <cmond 591:1024> <cmond_proc.c:141> New process ha_gcd pid 16776
Thu Dec 19 09:35:43.244 <cmond 591:1024> <cmond_pg.c:768> Recovery for process group cluster_hainfra complete.
Thu Dec 19 09:35:44.191 <cmond 591:1024> <cmond_sig.c:271> Process with pid 15738 has exited with status 256
Thu Dec 19 09:35:44.191 <cmond 591:1024> <cmond_sig.c:275> 1 processes have exited.
Thu Dec 19 09:35:44.191 <cmond 591:1024> <cmond_pg.c:687> Process ha_cmsd:15738 of group cluster_hainfra exited, status = 1.
Thu Dec 19 09:35:44.191 <cmond 591:1024> <cmond_pg.c:702> Initiating recovery for process group cluster_hainfra.
Thu Dec 19 09:35:46.194 <cmond 591:1024> <cmond_proc.c:178> Starting process ha_cmsd.
Thu Dec 19 09:35:46.194 <cmond 591:1024> <cmond_proc.c:98> Going to fork/exec new process "ha_cmsd -l ".
Thu Dec 19 09:35:46.194 <cmond 591:1024> <cmond_proc.c:141> New process ha_cmsd pid 16784
Thu Dec 19 09:35:46.194 <cmond 591:1024> <cmond_pg.c:768> Recovery for process group cluster_hainfra complete.
Thu Dec 19 09:35:50.326 <cmond 591:1024> <cmond_sig.c:271> Process with pid 16776 has exited with status 256
Thu Dec 19 09:35:50.326 <cmond 591:1024> <cmond_sig.c:271> Process with pid 15685 has exited with status 256
Thu Dec 19 09:35:50.326 <cmond 591:1024> <cmond_sig.c:275> 2 processes have exited.
Thu Dec 19 09:35:50.326 <cmond 591:1024> <cmond_pg.c:687> Process ha_gcd:16776 of group cluster_hainfra exited, status = 1.
Thu Dec 19 09:35:50.326 <cmond 591:1024> <cmond_pg.c:702> Initiating recovery for process group cluster_hainfra.
Thu Dec 19 09:35:52.334 <cmond 591:1024> <cmond_proc.c:178> Starting process ha_gcd.
Thu Dec 19 09:35:52.334 <cmond 591:1024> <cmond_proc.c:98> Going to fork/exec new process "ha_gcd -l ".
Thu Dec 19 09:35:52.334 <cmond 591:1024> <cmond_proc.c:141> New process ha_gcd pid 16798
Thu Dec 19 09:35:52.335 <cmond 591:1024> <cmond_pg.c:768> Recovery for process group cluster_hainfra complete.
Thu Dec 19 09:35:52.335 <cmond 591:1024> <cmond_pg.c:687> Process ha_fsd:15685 of group cluster_failsafe exited, status = 1.
Thu Dec 19 09:35:52.335 <cmond 591:1024> <cmond_pg.c:702> Initiating recovery for process group cluster_failsafe.
Thu Dec 19 09:35:54.344 <cmond 591:1024> <cmond_proc.c:178> Starting process ha_fsd.
Thu Dec 19 09:35:54.344 <cmond 591:1024> <cmond_proc.c:98> Going to fork/exec new process "ha_fsd -l ".
Thu Dec 19 09:35:54.344 <cmond 591:1024> <cmond_proc.c:141> New process ha_fsd pid 16802
Thu Dec 19 09:35:54.344 <cmond 591:1024> <cmond_pg.c:768> Recovery for process group cluster_failsafe complete.
Thu Dec 19 09:35:57.185 <cmond 591:1024> <cmond_sig.c:271> Process with pid 15698 has exited with status 256
Thu Dec 19 09:35:57.185 <cmond 591:1024> <cmond_sig.c:275> 1 processes have exited.
Thu Dec 19 09:35:57.185 <cmond 591:1024> <cmond_pg.c:687> Process ha_srmd:15698 of group cluster_hainfra exited, status = 1.
Thu Dec 19 09:35:57.185 <cmond 591:1024> <cmond_pg.c:702> Initiating recovery for process group cluster_hainfra.
Thu Dec 19 09:35:59.194 <cmond 591:1024> <cmond_proc.c:178> Starting process ha_srmd.
Thu Dec 19 09:35:59.194 <cmond 591:1024> <cmond_proc.c:98> Going to fork/exec new process "ha_srmd -l ".
Thu Dec 19 09:35:59.194 <cmond 591:1024> <cmond_proc.c:141> New process ha_srmd pid 16809
Thu Dec 19 09:35:59.194 <cmond 591:1024> <cmond_pg.c:768> Recovery for process group cluster_hainfra complete.


###############################################################
## cmsd_db1
###############################################################

Thu Dec 19 09:34:37.324 <W ha_cmsd cms 15738:0 cmsd_recv_timeout.c:201> Trying to remove node db2 from membership (node timeout). Check the node timeout for the cluster (current time = 0:55072230 recvtime = 0:55056270 timeout = 15000)
Thu Dec 19 09:34:37.324 <I0 ha_cmsd cms 15738:0 cmsd_state.c:628> cmsd state change from monitor to leader
Thu Dec 19 09:34:38.334 <I0 ha_cmsd cms 15738:0 cmsd_memb.c:795> Proposed Membership: sqn 2 G_sqn = 4, ack false
node db1 [1] :  UP       incarnation 118          age 2:0
node db2 [2] :  DOWN     incarnation 167          age 0:0
Thu Dec 19 09:34:38.334 <I0 ha_cmsd cms 15738:0 cmsd_state.c:628> cmsd state change from deliver to monitor
Thu Dec 19 09:34:38.334 <N ha_cmsd cms 15738:0 cmsd_fstop.c:380> Need to reset db2:167, sending request.
Thu Dec 19 09:34:38.334 <I0 ha_cmsd cms 15738:0 cmsd_reset.c:201> Reset request for node db2 enqueued.
Thu Dec 19 09:34:38.334 <I0 ha_cmsd cms 15738:0 cmsd_reset.c:229> Reset request for node: db2 sent.
Thu Dec 19 09:34:39.344 <I0 ha_cmsd cms 15738:0 cmsd_reset.c:299> Node db2 could NOT be RESET through local crsd, no owner. The node could still be reset through other cmsd/crsd.
Thu Dec 19 09:35:38.934 <N ha_cmsd cms 15738:0 cmsd_fstop.c:199> Reset request for db2:167 has timed out.
Thu Dec 19 09:35:40.954 <I0 ha_cmsd cms 15738:0 cmsd_memb.c:744> Node db2 has unknown status
Thu Dec 19 09:35:40.954 <N ha_cmsd cms 15738:0 cmsd_memb.c:795> Delivered Membership: sqn 2 G_sqn = 4, ack true
Tied membership could not be confirmed.
node db1 [1] :  UP       incarnation 118          age 2:0
node db2 [2] :  UNKNOWN  incarnation 167          age 0:0
Thu Dec 19 09:35:40.954 <W ha_cmsd cms 15738:0 cmsd_service.c:297> CI_CMSERR_LONELY, client error: name gcd, id 15746 command 5 error 0x200d
Thu Dec 19 09:35:40.954 <W ha_cmsd cms 15738:0 cmsd_service.c:297> CI_CMSERR_LONELY, client error: name cad, id 680 command 5 error 0x200d
Thu Dec 19 09:35:40.954 <N ha_cmsd cms 15738:0 cmsd.c:851> Cmsd is out of membership, will restart after notifying clients.

Thu Dec 19 09:35:41.214 <I0 ha_cmsd cms 15738:0 cmsd_client.c:105> client un-registration: gcd, id 15746
Thu Dec 19 09:35:44.188 <E ha_cmsd crs 15738:0 cmsd_reset.c:159> CI_FAILURE, Crs_unregister failed.
Thu Dec 19 09:35:44.189 <E ha_cmsd misc 15738:0 ci_restart.c:208> CI_FAILURE, Exiting, monitoring agent should revive me.

                * * * L o g g i n g    R e s t a r t e d * * *

Thu Dec 19 09:35:46.384 <N anonymous cms 16784:0 cmsd.c:301> ha_cmsd restarted.
Thu Dec 19 09:35:46.398 <I0 ha_cmsd cms 16788:1 cmsd_config.c:689> Config thread started.
Thu Dec 19 09:35:46.547 <I0 ha_cmsd cms 16788:1 cmsd_incarnation.c:126> Read new incarnation number type 1 data 119 size 4
Thu Dec 19 09:35:46.561 <I0 ha_cmsd cms 16788:1 cmsd_config.c:306> Reading CMS configuration.
Thu Dec 19 09:35:46.577 <I0 ha_cmsd cms 16788:1 cmsd_config.c:545> Completed reading CMS configuration.
Thu Dec 19 09:35:46.614 <I0 ha_cmsd cms 16784:0 cmsd_config.c:646> Begin configuration.
Thu Dec 19 09:35:46.614 <I0 ha_cmsd cms 16784:0 cmsd_config.c:652> Node db1 with nodeid 1 is enabled in cluster.
Thu Dec 19 09:35:46.614 <I0 ha_cmsd cms 16784:0 cmsd_config.c:652> Node db2 with nodeid 2 is enabled in cluster.
Thu Dec 19 09:35:46.614 <I0 ha_cmsd cms 16784:0 cmsd_config.c:656> The tie breaker node is db1.
Thu Dec 19 09:35:46.614 <I0 ha_cmsd cms 16784:0 cmsd_config.c:658> Node timeout is 15000 msecs.
Thu Dec 19 09:35:46.614 <I0 ha_cmsd cms 16784:0 cmsd_config.c:660> Heartbeat period is 1000 msecs.
Thu Dec 19 09:35:46.614 <I0 ha_cmsd cms 16784:0 cmsd_config.c:663> Cluster is in normal mode.
Thu Dec 19 09:35:46.614 <I0 ha_cmsd cms 16784:0 cmsd_config.c:665> End configuration.
Thu Dec 19 09:35:49.144 <I0 ha_cmsd crs 16784:0 cmsd_reset.c:119> Attempted to start reset line monitoring for the following (0) node(s).
Thu Dec 19 09:35:52.174 <I0 ha_cmsd cms 16784:0 cmsd_state.c:628> cmsd state change from init to leader
Thu Dec 19 09:35:53.184 <N ha_cmsd cms 16784:0 cmsd_state.c:454> Waiting (30000 msecs) for all (2) nodes to come up, right now I have 1 node(s) (0x1).
Thu Dec 19 09:35:53.184 <I0 ha_cmsd cms 16784:0 cmsd_client.c:93> client registration: cad, id 680
Thu Dec 19 09:35:54.194 <W ha_cmsd cms 16784:0 cmsd_service.c:297> CI_CMSERR_NOMEMB, client error: name cad, id 680 command 5 error 0x200c
Thu Dec 19 09:35:54.194 <I0 ha_cmsd cms 16784:0 cmsd_client.c:93> client registration: gcd, id 16798
Thu Dec 19 09:35:55.204 <W ha_cmsd cms 16784:0 cmsd_service.c:297> CI_CMSERR_NOMEMB, client error: name gcd, id 16798 command 5 error 0x200c
Thu Dec 19 09:36:11.364 <W ha_cmsd cms 16784:0 cmsd_service.c:297> LAST MESSAGE IN THE cms SUBSYSTEM REPEATED 15 TIMES
Thu Dec 19 09:36:11.364 <W ha_cmsd cms 16784:0 cmsd_service.c:297> CI_CMSERR_NOMEMB, client error: name cad, id 680 command 5 error 0x200c
Thu Dec 19 09:36:11.364 <W ha_cmsd cms 16784:0 cmsd_service.c:297> CI_CMSERR_NOMEMB, client error: name gcd, id 16798 command 5 error 0x200c
Thu Dec 19 09:36:17.424 <N ha_cmsd cms 16784:0 cmsd_state.c:467> LAST MESSAGE IN THE cms SUBSYSTEM REPEATED 5 TIMES
Thu Dec 19 09:36:17.424 <N ha_cmsd cms 16784:0 cmsd_state.c:467> Could not contact all (2) nodes in 30000 msecs, procceding with 1 node(s) (0x1).
Thu Dec 19 09:36:17.424 <I0 ha_cmsd cms 16784:0 cmsd_memb.c:795> Proposed Membership: sqn 1 G_sqn = 1, ack false
node db1 [1] :  UP       incarnation 119          age 1:0
node db2 [2] :  DOWN     incarnation 0    age 0:0
Thu Dec 19 09:36:17.424 <I0 ha_cmsd cms 16784:0 cmsd_state.c:628> cmsd state change from deliver to monitor
Thu Dec 19 09:36:17.424 <N ha_cmsd cms 16784:0 cmsd_fstop.c:380> Need to reset db2:0, sending request.
Thu Dec 19 09:36:17.424 <I0 ha_cmsd cms 16784:0 cmsd_reset.c:201> Reset request for node db2 enqueued.
Thu Dec 19 09:36:17.424 <I0 ha_cmsd cms 16784:0 cmsd_reset.c:229> Reset request for node: db2 sent.
Thu Dec 19 09:36:17.424 <W ha_cmsd cms 16784:0 cmsd_service.c:297> CI_CMSERR_NOMEMB, client error: name gcd, id 16798 command 5 error 0x200c
Thu Dec 19 09:36:18.434 <I0 ha_cmsd cms 16784:0 cmsd_reset.c:299> Node db2 could NOT be RESET through local crsd, no owner. The node could still be reset through other cmsd/crsd.
Thu Dec 19 09:36:18.434 <W ha_cmsd cms 16784:0 cmsd_service.c:297> CI_CMSERR_NOMEMB, client error: name gcd, id 16798 command 5 error 0x200c
Thu Dec 19 09:36:31.564 <W ha_cmsd cms 16784:0 cmsd_service.c:297> LAST MESSAGE IN THE cms SUBSYSTEM REPEATED 12 TIMES
Thu Dec 19 09:36:31.564 <W ha_cmsd cms 16784:0 cmsd_service.c:297> CI_CMSERR_NOMEMB, client error: name cad, id 680 command 5 error 0x200c
Thu Dec 19 09:36:31.564 <W ha_cmsd cms 16784:0 cmsd_service.c:297> CI_CMSERR_NOMEMB, client error: name gcd, id 16798 command 5 error 0x200c
Thu Dec 19 09:36:51.764 <W ha_cmsd cms 16784:0 cmsd_service.c:297> LAST MESSAGE IN THE cms SUBSYSTEM REPEATED 19 TIMES
Thu Dec 19 09:36:51.764 <W ha_cmsd cms 16784:0 cmsd_service.c:297> CI_CMSERR_NOMEMB, client error: name cad, id 680 command 5 error 0x200c
Thu Dec 19 09:36:51.764 <W ha_cmsd cms 16784:0 cmsd_service.c:297> CI_CMSERR_NOMEMB, client error: name gcd, id 16798 command 5 error 0x200c
Thu Dec 19 09:37:11.964 <W ha_cmsd cms 16784:0 cmsd_service.c:297> LAST MESSAGE IN THE cms SUBSYSTEM REPEATED 19 TIMES
Thu Dec 19 09:37:11.964 <W ha_cmsd cms 16784:0 cmsd_service.c:297> CI_CMSERR_NOMEMB, client error: name cad, id 680 command 5 error 0x200c
Thu Dec 19 09:37:11.964 <W ha_cmsd cms 16784:0 cmsd_service.c:297> CI_CMSERR_NOMEMB, client error: name gcd, id 16798 command 5 error 0x200c
Thu Dec 19 09:37:18.024 <N ha_cmsd cms 16784:0 cmsd_fstop.c:199> LAST MESSAGE IN THE cms SUBSYSTEM REPEATED 5 TIMES
Thu Dec 19 09:37:18.024 <N ha_cmsd cms 16784:0 cmsd_fstop.c:199> Reset request for db2:0 has timed out.
Thu Dec 19 09:37:18.024 <W ha_cmsd cms 16784:0 cmsd_service.c:297> CI_CMSERR_NOMEMB, client error: name gcd, id 16798 command 5 error 0x200c
Thu Dec 19 09:37:20.044 <I0 ha_cmsd cms 16784:0 cmsd_memb.c:744> LAST MESSAGE IN THE cms SUBSYSTEM REPEATED ONCE
Thu Dec 19 09:37:20.044 <I0 ha_cmsd cms 16784:0 cmsd_memb.c:744> Node db2 has unknown status
Thu Dec 19 09:37:20.044 <N ha_cmsd cms 16784:0 cmsd_memb.c:795> Delivered Membership: sqn 1 G_sqn = 1, ack true
Tied membership could not be confirmed.
node db1 [1] :  UP       incarnation 119          age 1:0
node db2 [2] :  UNKNOWN  incarnation 0    age 0:0
Thu Dec 19 09:37:20.044 <W ha_cmsd cms 16784:0 cmsd_service.c:297> CI_CMSERR_LONELY, client error: name cad, id 680 command 5 error 0x200d
Thu Dec 19 09:37:20.044 <W ha_cmsd cms 16784:0 cmsd_service.c:297> CI_CMSERR_NOMEMB, client error: name gcd, id 16798 command 5 error 0x200c
Thu Dec 19 09:37:20.045 <W ha_cmsd cms 16784:0 cmsd_service.c:297> CI_CMSERR_LONELY, client error: name gcd, id 16798 command 5 error 0x200d
Thu Dec 19 09:37:20.064 <N ha_cmsd cms 16784:0 cmsd.c:851> Cmsd is out of membership, will restart after notifying clients.

Thu Dec 19 09:37:20.324 <I0 ha_cmsd cms 16784:0 cmsd_client.c:105> client un-registration: gcd, id 16798
Thu Dec 19 09:37:23.298 <E ha_cmsd crs 16784:0 cmsd_reset.c:159> CI_FAILURE, Crs_unregister failed.
Thu Dec 19 09:37:23.299 <E ha_cmsd misc 16784:0 ci_restart.c:208> CI_FAILURE, Exiting, monitoring agent should revive me.



###############################################################
## crsd_db1
###############################################################

Thu Dec 19 09:34:38.884 <I0 crsd crs 607:0 crsd_pending.c:841> Reset request from cmsd:1 (local) for db2:2.
Thu Dec 19 09:34:38.884 <W crsd crs 607:0 crsd_pending.c:851> CI_CRSERR_NOTFOUND, Reset request 0x8061ab8 received for node 2, but its owner node does not exist.
Thu Dec 19 09:36:17.924 <I0 crsd crs 607:0 crsd_pending.c:841> Reset request from cmsd:1 (local) for db2:2.
Thu Dec 19 09:36:17.924 <W crsd crs 607:0 crsd_pending.c:851> CI_CRSERR_NOTFOUND, Reset request 0x8066850 received for node 2, but its owner node does not exist.
Thu Dec 19 09:37:57.034 <I0 crsd crs 607:0 crsd_pending.c:841> Reset request from cmsd:1 (local) for db2:2.
Thu Dec 19 09:37:57.034 <W crsd crs 607:0 crsd_pending.c:851> CI_CRSERR_NOTFOUND, Reset request 0x8062d08 received for node 2, but its owner node does not exist.
Thu Dec 19 09:39:36.084 <I0 crsd crs 607:0 crsd_pending.c:841> Reset request from cmsd:1 (local) for db2:2.
Thu Dec 19 09:39:36.084 <W crsd crs 607:0 crsd_pending.c:851> CI_CRSERR_NOTFOUND, Reset request 0x8062dc0 received for node 2, but its owner node does not exist.
Thu Dec 19 09:41:15.124 <I0 crsd crs 607:0 crsd_pending.c:841> Reset request from cmsd:1 (local) for db2:2.
Thu Dec 19 09:41:15.124 <W crsd crs 607:0 crsd_pending.c:851> CI_CRSERR_NOTFOUND, Reset request 0x8066fb0 received for node 2, but its owner node does not exist.
Thu Dec 19 09:42:54.194 <I0 crsd crs 607:0 crsd_pending.c:841> Reset request from cmsd:1 (local) for db2:2.
Thu Dec 19 09:42:54.194 <W crsd crs 607:0 crsd_pending.c:851> CI_CRSERR_NOTFOUND, Reset request 0x80675c8 received for node 2, but its owner node does not exist.
Thu Dec 19 09:44:33.274 <I0 crsd crs 607:0 crsd_pending.c:841> Reset request from cmsd:1 (local) for db2:2.
Thu Dec 19 09:44:33.274 <W crsd crs 607:0 crsd_pending.c:851> CI_CRSERR_NOTFOUND, Reset request 0x8067640 received for node 2, but its owner node does not exist.
Thu Dec 19 09:51:20.795 <W crsd crs 607:0 crs_config.c:667> CI_ERR_NOTFOUND, SystemController information for node db1 not found, requests will be ignored.
Thu Dec 19 09:51:20.795 <W crsd crs 607:0 crs_config.c:667> CI_ERR_NOTFOUND, SystemController information for node db2 not found, requests will be ignored.


###############################################################
## gcd_db1
###############################################################

Thu Dec 19 09:34:16.024 <I0 ha_gcd gcd 15746:0 gcd_cms.c:209> Calling cms_new_info(), iter = 125.
Thu Dec 19 09:34:22.054 <I0 ha_gcd gcd 15746:0 gcd_cms.c:209> Calling cms_new_info(), iter = 126.
Thu Dec 19 09:34:28.084 <I0 ha_gcd gcd 15746:0 gcd_cms.c:209> Calling cms_new_info(), iter = 127.
Thu Dec 19 09:34:28.234 <I0 ha_gcd gcd 15746:0 gcd_cms.c:209> Calling cms_new_info(), iter = 128.
Thu Dec 19 09:34:34.264 <I0 ha_gcd gcd 15746:0 gcd_cms.c:209> Calling cms_new_info(), iter = 129.
Thu Dec 19 09:34:40.294 <I0 ha_gcd gcd 15746:0 gcd_cms.c:209> Calling cms_new_info(), iter = 130.
Thu Dec 19 09:34:46.324 <I0 ha_gcd gcd 15746:0 gcd_cms.c:209> Calling cms_new_info(), iter = 131.
Thu Dec 19 09:34:52.354 <I0 ha_gcd gcd 15746:0 gcd_cms.c:209> Calling cms_new_info(), iter = 132.
Thu Dec 19 09:34:58.384 <I0 ha_gcd gcd 15746:0 gcd_cms.c:209> Calling cms_new_info(), iter = 133.
Thu Dec 19 09:34:58.534 <I0 ha_gcd gcd 15746:0 gcd_cms.c:209> Calling cms_new_info(), iter = 134.
Thu Dec 19 09:35:04.564 <I0 ha_gcd gcd 15746:0 gcd_cms.c:209> Calling cms_new_info(), iter = 135.
Thu Dec 19 09:35:10.594 <I0 ha_gcd gcd 15746:0 gcd_cms.c:209> Calling cms_new_info(), iter = 136.
Thu Dec 19 09:35:16.624 <I0 ha_gcd gcd 15746:0 gcd_cms.c:209> Calling cms_new_info(), iter = 137.
Thu Dec 19 09:35:22.654 <I0 ha_gcd gcd 15746:0 gcd_cms.c:209> Calling cms_new_info(), iter = 138.
Thu Dec 19 09:35:28.684 <I0 ha_gcd gcd 15746:0 gcd_cms.c:209> Calling cms_new_info(), iter = 139.
Thu Dec 19 09:35:28.834 <I0 ha_gcd gcd 15746:0 gcd_cms.c:209> Calling cms_new_info(), iter = 140.
Thu Dec 19 09:35:34.864 <I0 ha_gcd gcd 15746:0 gcd_cms.c:209> Calling cms_new_info(), iter = 141.
Thu Dec 19 09:35:40.894 <I0 ha_gcd gcd 15746:0 gcd_cms.c:209> Calling cms_new_info(), iter = 142.
Thu Dec 19 09:35:40.954 <I0 ha_gcd gcd 15746:0 gcd_cms.c:209> Calling cms_new_info(), iter = 143.
Thu Dec 19 09:35:40.954 <E ha_gcd gcd 15746:0 gcd_cms.c:240> CI_CMSERR_LONELY, CMS is telling me that I am lonely.
Cleaning up and restarting.
Thu Dec 19 09:35:40.954 <E ha_gcd gcd 15746:0 gcd_loop.c:96> CI_CMSERR_LONELY, The CMS daemon has entered the lonely state.
Cleaning up and restarting. Bye for now!

                * * * L o g g i n g    R e s t a r t e d * * *

Thu Dec 19 09:35:43.411 <I0 ha_gcd gcd 16776:0 gcd_options.c:721> Value of gcd_incno = 213.
Thu Dec 19 09:35:43.413 <N ha_gcd gcd 16776:0 gcd_init.c:206> My node name = db1.
Thu Dec 19 09:35:44.204 <E ha_gcd gcd 16776:0 gcd_cms.c:140> CI_IPCERR_NOCONN, cms_register() failed.
Thu Dec 19 09:35:44.204 <E ha_gcd gcd 16776:0 gcd_init.c:237> CI_IPCERR_NOCONN, No IPC connection to CMSD. Cleaning up and restarting. Bye for now!

                * * * L o g g i n g    R e s t a r t e d * * *

Thu Dec 19 09:35:52.503 <I0 ha_gcd gcd 16798:0 gcd_options.c:721> Value of gcd_incno = 214.
Thu Dec 19 09:35:52.505 <N ha_gcd gcd 16798:0 gcd_init.c:206> My node name = db1.
Thu Dec 19 09:35:54.204 <N ha_gcd gcd 16798:0 gcd_cms.c:164> My nodeid = 1 [0x1].
Thu Dec 19 09:35:54.204 <I0 ha_gcd gcd 16798:0 gcd_cms.c:209> Calling cms_new_info(), iter = 0.
Thu Dec 19 09:37:20.064 <E ha_gcd gcd 16798:0 gcd_cms.c:240> CI_CMSERR_LONELY, CMS is telling me that I am lonely.
Cleaning up and restarting.
Thu Dec 19 09:37:20.064 <E ha_gcd gcd 16798:0 gcd_cms.c:190> CI_CMSERR_LONELY, get_cms_membership() failed.
Thu Dec 19 09:37:20.064 <E ha_gcd gcd 16798:0 gcd_init.c:231> CI_CMSERR_LONELY, The CMS daemon has entered the lonely state.
Cleaning up and restarting. Bye for now!


###############################################################
## srmd_db1
###############################################################

Thu Dec 19 09:25:49.604 <I0 ha_srmd srm 15754:2 sc_reply.c:134> Poll (responses without requests) request reply done
Thu Dec 19 09:35:50.154 <W ha_srmd gcs 15698:0 gcc.c:668> LAST MESSAGE IN THE gcs SUBSYSTEM REPEATED 80 TIMES
Thu Dec 19 09:35:50.154 <W ha_srmd gcs 15698:0 gcc.c:668> CI_IPCERR_NOCONN, gcs_pulse(): Pulse for instid = 0x10001 failed with error = CI_IPCERR_NOCONN
Thu Dec 19 09:35:50.154 <N ha_srmd srm 15698:0 sr_main.c:165> LAST MESSAGE IN THE srm SUBSYSTEM REPEATED ONCE
Thu Dec 19 09:35:50.154 <E ha_srmd srm 15698:0 sr_main.c:165> CI_IPCERR_NOCONN, SRM gcs pulse failed
Thu Dec 19 09:35:50.154 <E ha_srmd srm 15698:0 srm_main.c:269> CI_IPCERR_NOCONN, Main thread exited
Thu Dec 19 09:35:50.157 <W ha_srmd gcs 15698:0 gcc.c:450> CI_IPCERR_NOCONN, gcs_unregister(): ipcclnt_send failed
Thu Dec 19 09:35:50.157 <E ha_srmd srm 15698:0 srm_gcs.c:172> CI_IPCERR_NOCONN, gcs_unregister failed

                * * * L o g g i n g    R e s t a r t e d * * *

Thu Dec 19 09:35:59.278 <I0 ha_srmd srm 16809:0 srm_main.c:145> ha_srmd is running as foreground process
Thu Dec 19 09:35:59.291 <W ha_srmd srm 16809:0 srm_config.c:1058> CI_CONFERR_NOTFOUND, Could not read local SRM parameters
Thu Dec 19 09:35:59.436 <W ha_srmd srm 16809:0 srm_config.c:1272> local resource type definitions not present
Thu Dec 19 09:36:19.445 <W ha_srmd ipc 16809:0 ipc_clnt.c:295> CI_IPCERR_NOSERVER, Connection file /var/run/failsafe/comm/gcs_db1 not present.
Thu Dec 19 09:36:19.445 <W ha_srmd gcs 16809:0 gcc.c:203> CI_IPCERR_NOSERVER, gcs_register(): ipcclnt_connect failed
Thu Dec 19 09:37:19.035 <W ha_srmd srm 16809:0 srm_gcs.c:111> CI_IPCERR_NOSERVER, Waited 1 minutes for GCD (ha_gcd) process to start
Thu Dec 19 09:38:19.635 <W ha_srmd srm 16809:0 srm_gcs.c:111> CI_IPCERR_NOSERVER, Waited 2 minutes for GCD (ha_gcd) process to start
Thu Dec 19 09:40:20.835 <W ha_srmd srm 16809:0 srm_gcs.c:111> CI_IPCERR_NOSERVER, Waited 4 minutes for GCD (ha_gcd) process to start
Thu Dec 19 09:44:23.235 <W ha_srmd srm 16809:0 srm_gcs.c:111> CI_IPCERR_NOSERVER, Waited 8 minutes for GCD (ha_gcd) process to start
Thu Dec 19 09:45:58.684 <W ha_srmd gcs 16809:0 gcc.c:309> LAST MESSAGE IN THE gcs SUBSYSTEM REPEATED 571 TIMES
Thu Dec 19 09:45:58.684 <W ha_srmd gcs 16809:0 gcc.c:309> gcs_register(): registration failed
Thu Dec 19 09:46:00.714 <W ha_srmd gcs 16809:0 gcc.c:309> gcs_register(): registration failed


~
~
~
~


---------MDLINK-NGMime-5773-1040302312.744128-------
content-type: text/plain
content-disposition: inline; filename="failsafelogs.txt"
content-transfer-encoding: 7bit
content-length: 33022

###############################################################
## cdbd_log
###############################################################

Thu Dec 19 09:34:31.774 db1 cdbd  - Null RPC (heartbeat) to db2: error : RPC: Timed out

Thu Dec 19 09:34:31.774 db1 cdbd  - Pool machine db2 (2) not responding
Thu Dec 19 09:34:31.774 db1 cdbd  - Need new quorum: machine 2 (db2) dropped out
Thu Dec 19 09:34:31.774 db1 cdbd  - Need replacement for old quorum: cluster id: 0x00000000.0x3dff738e233bb6e0, master: 1, sequence: 54, member count: 2, members:  2,  1
Thu Dec 19 09:34:31.774 db1 cdbd  - Proposed new quorum: cluster id: 0x00000000.0x3dff738e233bb6e0, master: 1, sequence: 55, member count: 1, members:  1
Thu Dec 19 09:34:31.816 db1 cdbd  - New quorum: cluster id: 0x00000000.0x3dff738e233bb6e0, master: 1, sequence: 55, member count: 1, members:  1
Thu Dec 19 09:34:31.816 db1 cdbd  - New quorum: cluster id: 0x00000000.0x3dff738e233bb6e0, master: 1, sequence: 55, member count: 1, members:  1
Thu Dec 19 09:34:31.816 db1 cdbd  - Ready and valid new quorum: cluster id: 0x00000000.0x3dff738e233bb6e0, master: 1, sequence: 55, member count: 1, members:  1
Thu Dec 19 09:35:48.854 db1 cdbd  - Machine db2.gilat.com.br has stopped polling.


###############################################################
## failsafe_db1
###############################################################

Thu Dec 19 09:35:50.324 <W ha_fsd gcs 15685:0 gcc.c:668> CI_IPCERR_NOCONN, gcs_pulse(): Pulse for instid = 0x10001 failed with error = CI_IPCERR_NOCONN
Thu Dec 19 09:35:50.324 <E ha_fsd fsd 15685:0 fs_misc.c:405> CI_IPCERR_NOCONN, Failed to pulse, GCD probably died
Thu Dec 19 09:35:50.324 <E ha_fsd fsd 15685:0 fs_misc.c:411> CI_IPCERR_NOCONN, (curr_time: 55145230) (passed_time: 0) (fs_time2pulse: 9970) (fs_globaltime: 0) (time2pulse: 30) (fs_pulsetime: 0) (real

                * * * L o g g i n g    R e s t a r t e d * * *

Thu Dec 19 09:35:54.431 <N ha_fsd fsd 16802:0 fs_main.c:178> /usr/lib/failsafe/bin/ha_fsd is running as foreground process
Thu Dec 19 09:35:54.432 <D0 ha_fsd fsd 16802:0 fs_main.c:468> Successfully marked this FSD as a realtime process
Thu Dec 19 09:35:54.438 <D0 ha_fsd fsd 16802:0 fs_config.c:900> Registering for notification
Thu Dec 19 09:35:54.438 <D0 ha_fsd fsd 16802:0 fs_config.c:917>   FOP delete msgs: #global#HA#FailoverPolicies
Thu Dec 19 09:35:54.438 <D0 ha_fsd fsd 16802:0 fs_config.c:923>   FOP update msgs: #global#HA#FailoverPolicies
Thu Dec 19 09:35:54.438 <D0 ha_fsd fsd 16802:0 fs_config.c:942>   cluster/node delete msgs: #cluster#db#machines
Thu Dec 19 09:35:54.438 <D0 ha_fsd fsd 16802:0 fs_config.c:948>   cluster/node update msgs: #cluster#db#machines
Thu Dec 19 09:35:54.439 <D0 ha_fsd fsd 16802:0 fs_config.c:965>   log params update msgs: #cluster#db#logging
Thu Dec 19 09:35:54.439 <D0 ha_fsd fsd 16802:0 fs_config.c:984>   Timeouts update msgs: #cluster#db#HA#services#failsafe
Thu Dec 19 09:35:54.440 <D1 ha_fsd fsd 16802:0 fs_config.c:188> BootSleep: 30000
Thu Dec 19 09:35:54.440 <D1 ha_fsd fsd 16802:0 fs_config.c:212> PulseTime: 10000
Thu Dec 19 09:35:54.441 <D1 ha_fsd fsd 16802:0 fs_config.c:239> Initial PulseTime: 20000
Thu Dec 19 09:35:54.441 <D1 ha_fsd fsd 16802:0 fs_config.c:264> GCS_GroupID: 11
Thu Dec 19 09:35:54.441 <D1 ha_fsd fsd 16802:0 fs_config.c:289> GCS_MessageSize: 64000
Thu Dec 19 09:35:54.441 <D1 ha_fsd fsd 16802:0 fs_config.c:314> ResetTimeout: 15
Thu Dec 19 09:35:54.441 <D0 ha_fsd config 16802:0 ci_cdbconfig_nodes.c:144> (Re)reading configuration.
Thu Dec 19 09:35:54.442 <D0 ha_fsd config 16802:0 ci_cdbconfig_nodes.c:282> Reading network interfaces for machine db1.
Thu Dec 19 09:35:54.443 <D0 ha_fsd config 16802:0 ci_cdbconfig_nodes.c:401>     Interface[0] = net0, IP address 192.168.2.103, priority 1.
Thu Dec 19 09:35:54.444 <D0 ha_fsd config 16802:0 ci_cdbconfig_nodes.c:341> Ignoring interface net1 formachine db1, not a heartbeat interface.
Thu Dec 19 09:35:54.444 <D0 ha_fsd config 16802:0 ci_cdbconfig_nodes.c:411> Finished reading 1 interfaces for node db1.
Thu Dec 19 09:35:54.444 <D0 ha_fsd config 16802:0 ci_cdbconfig_nodes.c:282> Reading network interfaces for machine db2.
Thu Dec 19 09:35:54.445 <D0 ha_fsd config 16802:0 ci_cdbconfig_nodes.c:401>     Interface[0] = net0, IP address 192.168.2.104, priority 1.
Thu Dec 19 09:35:54.445 <D0 ha_fsd config 16802:0 ci_cdbconfig_nodes.c:341> Ignoring interface net1 formachine db2, not a heartbeat interface.
Thu Dec 19 09:35:54.445 <D0 ha_fsd config 16802:0 ci_cdbconfig_nodes.c:411> Finished reading 1 interfaces for node db2.
Thu Dec 19 09:35:54.445 <D0 ha_fsd config 16802:0 ci_cdbconfig_nodes.c:218> Configuration read successful.
Thu Dec 19 09:35:54.450 <W ha_fsd fsd 16802:0 fs_resources.c:186>
Resource Group: db
   bind status (if error): success
   bind state: Undefined
   owner: ERROR (id: 0xfffffffe)
   new owner: ERROR (id: 0xfffffffe)
   flags: Controlled_Failback InPlace_Recovery enable
   policy: db_fail
     version: 1
     script: ordered
     attr(0): Controlled_Failback
     attr(1): InPlace_Recovery
     AFD(0): db1
     AFD(1): db2
   res state: Uninitialized
   res error: No error
   res action: SRMACT_RELEASE
   res flags: 0x0
   send update: NO
Thu Dec 19 09:35:54.450 <W ha_fsd fsd 16802:0 fs_resources.c:2462>
        Resource: 192.168.0.2  (type: IP_address)
        Resource: /ArrayB  (type: Filesystem)
        Resource: pgsql_loc  (type: Generic)
        Resource: ArrayB  (type: ServeRaid)

Thu Dec 19 09:35:54.450 <D0 ha_fsd fsd 16802:0 fs_main.c:228> Initial sleep time 60 seconds
Thu Dec 19 09:36:54.984 <D1 ha_fsd fsd 16802:0 fs_crs.c:82> Waiting to register with CRS
Thu Dec 19 09:36:55.994 <D0 ha_fsd crs 16802:0 crsl_register.c:254> Registration successful.
Thu Dec 19 09:36:55.994 <W ha_fsd fsd 16802:0 fs_crs.c:93> CRS Registration successful
Thu Dec 19 09:36:55.994 <W ha_fsd ipc 16802:0 ipc_clnt.c:295> CI_IPCERR_NOSERVER, Connection file /var/run/failsafe/comm/srm_db1 not present.
Thu Dec 19 09:36:55.995 <D0 ha_fsd fsd 16802:0 fs_srm.c:191> Waiting for good handle from SRM register
Thu Dec 19 09:37:55.584 <N ha_fsd fsd 16802:0 fs_srm.c:187> LAST MESSAGE IN THE fsd SUBSYSTEM REPEATED 58 TIMES
Thu Dec 19 09:37:55.585 <E ha_fsd fsd 16802:0 fs_srm.c:187> CI_IPCERR_NOSERVER, Waiting 1 minutes for SRM (ha_srmd) process
Thu Dec 19 09:37:56.594 <D0 ha_fsd fsd 16802:0 fs_srm.c:191> Waiting for good handle from SRM register
Thu Dec 19 09:38:56.184 <N ha_fsd fsd 16802:0 fs_srm.c:187> LAST MESSAGE IN THE fsd SUBSYSTEM REPEATED 58 TIMES
Thu Dec 19 09:38:56.184 <E ha_fsd fsd 16802:0 fs_srm.c:187> CI_IPCERR_NOSERVER, Waiting 2 minutes for SRM (ha_srmd) process
Thu Dec 19 09:38:57.194 <D0 ha_fsd fsd 16802:0 fs_srm.c:191> Waiting for good handle from SRM register
Thu Dec 19 09:40:57.384 <N ha_fsd fsd 16802:0 fs_srm.c:187> LAST MESSAGE IN THE fsd SUBSYSTEM REPEATED 118 TIMES
Thu Dec 19 09:40:57.385 <E ha_fsd fsd 16802:0 fs_srm.c:187> CI_IPCERR_NOSERVER, Waiting 4 minutes for SRM (ha_srmd) process
Thu Dec 19 09:40:58.394 <D0 ha_fsd fsd 16802:0 fs_srm.c:191> Waiting for good handle from SRM register
Thu Dec 19 09:44:59.784 <N ha_fsd fsd 16802:0 fs_srm.c:187> LAST MESSAGE IN THE fsd SUBSYSTEM REPEATED 238 TIMES
Thu Dec 19 09:44:59.784 <E ha_fsd fsd 16802:0 fs_srm.c:187> CI_IPCERR_NOSERVER, Waiting 8 minutes for SRM (ha_srmd) process
Thu Dec 19 09:45:00.794 <D0 ha_fsd fsd 16802:0 fs_srm.c:191> Waiting for good handle from SRM register
Thu Dec 19 09:46:18.064 <D0 ha_fsd fsd 16802:0 fs_srm.c:204> LAST MESSAGE IN THE fsd SUBSYSTEM REPEATED 75 TIMES


###############################################################
## cad_log
###############################################################

Thu Dec 19 09:34:11.311 <cam_casmail 600:1024> Notification cmd exited with 32512.
Thu Dec 19 09:34:41.311 <cam_casmail 600:1024> Notification cmd exited with 32512.
Thu Dec 19 09:35:11.311 <cam_casmail 600:1024> Notification cmd exited with 32512.
Thu Dec 19 09:35:41.316 <cam_cms 680:6151> ccms_poll_cmsd: cms_poll failed with error CI_CMSERR_LONELY
Thu Dec 19 09:35:41.320 <cam_casmail 600:1024> Notification cmd exited with 32512.
Thu Dec 19 09:35:50.328 <cad 675:1026> cas_msg_write<TRANSPORT>: transport currently in use to write previous message
Thu Dec 19 09:35:50.329 <cam_svc 675:1026> error 3 sending event notification to client 0x0000000240520bf0
Thu Dec 19 09:35:50.329 <cad 675:1026> cas_msg_write<TRANSPORT>: transport currently in use to write previous message
Thu Dec 19 09:35:50.329 <cam_svc 675:1026> error 3 sending event notification to client 0x0000000a4051fba0
Thu Dec 19 09:35:50.329 <cam_casmail 682:8201> ccamail_cam_get_event() called
Thu Dec 19 09:35:50.329 <cad 675:1026> cas_msg_write<TRANSPORT>: transport currently in use to write previous message
Thu Dec 19 09:35:50.329 <cam_svc 675:1026> error 3 sending event notification to client 0x000000114051a278
Thu Dec 19 09:35:50.329 <cam_casmail 682:8201> exiting ccamail_cam_get_event(): event = 0x405844a8
Thu Dec 19 09:35:51.314 <cam_srm 676:2051> csrm_poll_srmd: srm_poll failed with error CI_IPCERR_NOCONN
Thu Dec 19 09:35:51.316 <cad 681:7176> cfs_fs_connect: fs_cam_register failed with error FailSafe is not ready to accept admin requests.
Thu Dec 19 09:35:51.317 <cam_svc 675:1026> Received inventory for category 50 flags 1
Thu Dec 19 09:35:51.317 <cad 675:1026> cas_msg_write<TRANSPORT>: transport currently in use to write previous message
Thu Dec 19 09:35:51.317 <cam_svc 675:1026> error 3 sending event notification to client 0x0000000340521050
Thu Dec 19 09:35:51.317 <cad 675:1026> cas_msg_write<TRANSPORT>: transport currently in use to write previous message
Thu Dec 19 09:35:51.317 <cam_svc 675:1026> error 3 sending event notification to client 0x0000000b4051b4b8
Thu Dec 19 09:35:51.317 <cad 675:1026> cas_msg_write<TRANSPORT>: transport currently in use to write previous message
Thu Dec 19 09:35:51.317 <cam_svc 675:1026> error 3 sending event notification to client 0x0000001240504770
Thu Dec 19 09:35:51.317 <cam_srm 676:2051> csrm_srm_shutdown: srm_unregister failed with error CI_IPCERR_NOCONN
Thu Dec 19 09:36:01.315 <cad 681:7176> cfs_fs_connect: fs_cam_register failed with error FailSafe is not ready to accept admin requests.
Thu Dec 19 09:36:11.314 <cad 681:7176> cfs_fs_connect: fs_cam_register failed with error FailSafe is not ready to accept admin requests.
Thu Dec 19 09:36:11.321 <cam_casmail 600:1024> Notification cmd exited with 32512.
Thu Dec 19 09:36:21.324 <cad 681:7176> cfs_fs_connect: fs_cam_register failed with error FailSafe is not ready to accept admin requests.
Thu Dec 19 09:36:31.334 <cad 681:7176> cfs_fs_connect: fs_cam_register failed with error FailSafe is not ready to accept admin requests.
Thu Dec 19 09:36:41.337 <cad 681:7176> cfs_fs_connect: fs_cam_register failed with error FailSafe is not ready to accept admin requests.
Thu Dec 19 09:36:41.341 <cam_casmail 600:1024> Notification cmd exited with 32512.


###############################################################
## cmond_log
###############################################################

Thu Dec 19 09:35:41.235 <cmond 591:1024> <cmond_sig.c:271> Process with pid 15746 has exited with status 256
Thu Dec 19 09:35:41.235 <cmond 591:1024> <cmond_sig.c:275> 1 processes have exited.
Thu Dec 19 09:35:41.235 <cmond 591:1024> <cmond_pg.c:687> Process ha_gcd:15746 of group cluster_hainfra exited, status = 1.
Thu Dec 19 09:35:41.235 <cmond 591:1024> <cmond_pg.c:702> Initiating recovery for process group cluster_hainfra.
Thu Dec 19 09:35:43.244 <cmond 591:1024> <cmond_proc.c:178> Starting process ha_gcd.
Thu Dec 19 09:35:43.244 <cmond 591:1024> <cmond_proc.c:98> Going to fork/exec new process "ha_gcd -l ".
Thu Dec 19 09:35:43.244 <cmond 591:1024> <cmond_proc.c:141> New process ha_gcd pid 16776
Thu Dec 19 09:35:43.244 <cmond 591:1024> <cmond_pg.c:768> Recovery for process group cluster_hainfra complete.
Thu Dec 19 09:35:44.191 <cmond 591:1024> <cmond_sig.c:271> Process with pid 15738 has exited with status 256
Thu Dec 19 09:35:44.191 <cmond 591:1024> <cmond_sig.c:275> 1 processes have exited.
Thu Dec 19 09:35:44.191 <cmond 591:1024> <cmond_pg.c:687> Process ha_cmsd:15738 of group cluster_hainfra exited, status = 1.
Thu Dec 19 09:35:44.191 <cmond 591:1024> <cmond_pg.c:702> Initiating recovery for process group cluster_hainfra.
Thu Dec 19 09:35:46.194 <cmond 591:1024> <cmond_proc.c:178> Starting process ha_cmsd.
Thu Dec 19 09:35:46.194 <cmond 591:1024> <cmond_proc.c:98> Going to fork/exec new process "ha_cmsd -l ".
Thu Dec 19 09:35:46.194 <cmond 591:1024> <cmond_proc.c:141> New process ha_cmsd pid 16784
Thu Dec 19 09:35:46.194 <cmond 591:1024> <cmond_pg.c:768> Recovery for process group cluster_hainfra complete.
Thu Dec 19 09:35:50.326 <cmond 591:1024> <cmond_sig.c:271> Process with pid 16776 has exited with status 256
Thu Dec 19 09:35:50.326 <cmond 591:1024> <cmond_sig.c:271> Process with pid 15685 has exited with status 256
Thu Dec 19 09:35:50.326 <cmond 591:1024> <cmond_sig.c:275> 2 processes have exited.
Thu Dec 19 09:35:50.326 <cmond 591:1024> <cmond_pg.c:687> Process ha_gcd:16776 of group cluster_hainfra exited, status = 1.
Thu Dec 19 09:35:50.326 <cmond 591:1024> <cmond_pg.c:702> Initiating recovery for process group cluster_hainfra.
Thu Dec 19 09:35:52.334 <cmond 591:1024> <cmond_proc.c:178> Starting process ha_gcd.
Thu Dec 19 09:35:52.334 <cmond 591:1024> <cmond_proc.c:98> Going to fork/exec new process "ha_gcd -l ".
Thu Dec 19 09:35:52.334 <cmond 591:1024> <cmond_proc.c:141> New process ha_gcd pid 16798
Thu Dec 19 09:35:52.335 <cmond 591:1024> <cmond_pg.c:768> Recovery for process group cluster_hainfra complete.
Thu Dec 19 09:35:52.335 <cmond 591:1024> <cmond_pg.c:687> Process ha_fsd:15685 of group cluster_failsafe exited, status = 1.
Thu Dec 19 09:35:52.335 <cmond 591:1024> <cmond_pg.c:702> Initiating recovery for process group cluster_failsafe.
Thu Dec 19 09:35:54.344 <cmond 591:1024> <cmond_proc.c:178> Starting process ha_fsd.
Thu Dec 19 09:35:54.344 <cmond 591:1024> <cmond_proc.c:98> Going to fork/exec new process "ha_fsd -l ".
Thu Dec 19 09:35:54.344 <cmond 591:1024> <cmond_proc.c:141> New process ha_fsd pid 16802
Thu Dec 19 09:35:54.344 <cmond 591:1024> <cmond_pg.c:768> Recovery for process group cluster_failsafe complete.
Thu Dec 19 09:35:57.185 <cmond 591:1024> <cmond_sig.c:271> Process with pid 15698 has exited with status 256
Thu Dec 19 09:35:57.185 <cmond 591:1024> <cmond_sig.c:275> 1 processes have exited.
Thu Dec 19 09:35:57.185 <cmond 591:1024> <cmond_pg.c:687> Process ha_srmd:15698 of group cluster_hainfra exited, status = 1.
Thu Dec 19 09:35:57.185 <cmond 591:1024> <cmond_pg.c:702> Initiating recovery for process group cluster_hainfra.
Thu Dec 19 09:35:59.194 <cmond 591:1024> <cmond_proc.c:178> Starting process ha_srmd.
Thu Dec 19 09:35:59.194 <cmond 591:1024> <cmond_proc.c:98> Going to fork/exec new process "ha_srmd -l ".
Thu Dec 19 09:35:59.194 <cmond 591:1024> <cmond_proc.c:141> New process ha_srmd pid 16809
Thu Dec 19 09:35:59.194 <cmond 591:1024> <cmond_pg.c:768> Recovery for process group cluster_hainfra complete.


###############################################################
## cmsd_db1
###############################################################

Thu Dec 19 09:34:37.324 <W ha_cmsd cms 15738:0 cmsd_recv_timeout.c:201> Trying to remove node db2 from membership (node timeout). Check the node timeout for the cluster (current time = 0:55072230 recvtime = 0:55056270 timeout = 15000)
Thu Dec 19 09:34:37.324 <I0 ha_cmsd cms 15738:0 cmsd_state.c:628> cmsd state change from monitor to leader
Thu Dec 19 09:34:38.334 <I0 ha_cmsd cms 15738:0 cmsd_memb.c:795> Proposed Membership: sqn 2 G_sqn = 4, ack false
node db1 [1] :  UP       incarnation 118          age 2:0
node db2 [2] :  DOWN     incarnation 167          age 0:0
Thu Dec 19 09:34:38.334 <I0 ha_cmsd cms 15738:0 cmsd_state.c:628> cmsd state change from deliver to monitor
Thu Dec 19 09:34:38.334 <N ha_cmsd cms 15738:0 cmsd_fstop.c:380> Need to reset db2:167, sending request.
Thu Dec 19 09:34:38.334 <I0 ha_cmsd cms 15738:0 cmsd_reset.c:201> Reset request for node db2 enqueued.
Thu Dec 19 09:34:38.334 <I0 ha_cmsd cms 15738:0 cmsd_reset.c:229> Reset request for node: db2 sent.
Thu Dec 19 09:34:39.344 <I0 ha_cmsd cms 15738:0 cmsd_reset.c:299> Node db2 could NOT be RESET through local crsd, no owner. The node could still be reset through other cmsd/crsd.
Thu Dec 19 09:35:38.934 <N ha_cmsd cms 15738:0 cmsd_fstop.c:199> Reset request for db2:167 has timed out.
Thu Dec 19 09:35:40.954 <I0 ha_cmsd cms 15738:0 cmsd_memb.c:744> Node db2 has unknown status
Thu Dec 19 09:35:40.954 <N ha_cmsd cms 15738:0 cmsd_memb.c:795> Delivered Membership: sqn 2 G_sqn = 4, ack true
Tied membership could not be confirmed.
node db1 [1] :  UP       incarnation 118          age 2:0
node db2 [2] :  UNKNOWN  incarnation 167          age 0:0
Thu Dec 19 09:35:40.954 <W ha_cmsd cms 15738:0 cmsd_service.c:297> CI_CMSERR_LONELY, client error: name gcd, id 15746 command 5 error 0x200d
Thu Dec 19 09:35:40.954 <W ha_cmsd cms 15738:0 cmsd_service.c:297> CI_CMSERR_LONELY, client error: name cad, id 680 command 5 error 0x200d
Thu Dec 19 09:35:40.954 <N ha_cmsd cms 15738:0 cmsd.c:851> Cmsd is out of membership, will restart after notifying clients.

Thu Dec 19 09:35:41.214 <I0 ha_cmsd cms 15738:0 cmsd_client.c:105> client un-registration: gcd, id 15746
Thu Dec 19 09:35:44.188 <E ha_cmsd crs 15738:0 cmsd_reset.c:159> CI_FAILURE, Crs_unregister failed.
Thu Dec 19 09:35:44.189 <E ha_cmsd misc 15738:0 ci_restart.c:208> CI_FAILURE, Exiting, monitoring agent should revive me.

                * * * L o g g i n g    R e s t a r t e d * * *

Thu Dec 19 09:35:46.384 <N anonymous cms 16784:0 cmsd.c:301> ha_cmsd restarted.
Thu Dec 19 09:35:46.398 <I0 ha_cmsd cms 16788:1 cmsd_config.c:689> Config thread started.
Thu Dec 19 09:35:46.547 <I0 ha_cmsd cms 16788:1 cmsd_incarnation.c:126> Read new incarnation number type 1 data 119 size 4
Thu Dec 19 09:35:46.561 <I0 ha_cmsd cms 16788:1 cmsd_config.c:306> Reading CMS configuration.
Thu Dec 19 09:35:46.577 <I0 ha_cmsd cms 16788:1 cmsd_config.c:545> Completed reading CMS configuration.
Thu Dec 19 09:35:46.614 <I0 ha_cmsd cms 16784:0 cmsd_config.c:646> Begin configuration.
Thu Dec 19 09:35:46.614 <I0 ha_cmsd cms 16784:0 cmsd_config.c:652> Node db1 with nodeid 1 is enabled in cluster.
Thu Dec 19 09:35:46.614 <I0 ha_cmsd cms 16784:0 cmsd_config.c:652> Node db2 with nodeid 2 is enabled in cluster.
Thu Dec 19 09:35:46.614 <I0 ha_cmsd cms 16784:0 cmsd_config.c:656> The tie breaker node is db1.
Thu Dec 19 09:35:46.614 <I0 ha_cmsd cms 16784:0 cmsd_config.c:658> Node timeout is 15000 msecs.
Thu Dec 19 09:35:46.614 <I0 ha_cmsd cms 16784:0 cmsd_config.c:660> Heartbeat period is 1000 msecs.
Thu Dec 19 09:35:46.614 <I0 ha_cmsd cms 16784:0 cmsd_config.c:663> Cluster is in normal mode.
Thu Dec 19 09:35:46.614 <I0 ha_cmsd cms 16784:0 cmsd_config.c:665> End configuration.
Thu Dec 19 09:35:49.144 <I0 ha_cmsd crs 16784:0 cmsd_reset.c:119> Attempted to start reset line monitoring for the following (0) node(s).
Thu Dec 19 09:35:52.174 <I0 ha_cmsd cms 16784:0 cmsd_state.c:628> cmsd state change from init to leader
Thu Dec 19 09:35:53.184 <N ha_cmsd cms 16784:0 cmsd_state.c:454> Waiting (30000 msecs) for all (2) nodes to come up, right now I have 1 node(s) (0x1).
Thu Dec 19 09:35:53.184 <I0 ha_cmsd cms 16784:0 cmsd_client.c:93> client registration: cad, id 680
Thu Dec 19 09:35:54.194 <W ha_cmsd cms 16784:0 cmsd_service.c:297> CI_CMSERR_NOMEMB, client error: name cad, id 680 command 5 error 0x200c
Thu Dec 19 09:35:54.194 <I0 ha_cmsd cms 16784:0 cmsd_client.c:93> client registration: gcd, id 16798
Thu Dec 19 09:35:55.204 <W ha_cmsd cms 16784:0 cmsd_service.c:297> CI_CMSERR_NOMEMB, client error: name gcd, id 16798 command 5 error 0x200c
Thu Dec 19 09:36:11.364 <W ha_cmsd cms 16784:0 cmsd_service.c:297> LAST MESSAGE IN THE cms SUBSYSTEM REPEATED 15 TIMES
Thu Dec 19 09:36:11.364 <W ha_cmsd cms 16784:0 cmsd_service.c:297> CI_CMSERR_NOMEMB, client error: name cad, id 680 command 5 error 0x200c
Thu Dec 19 09:36:11.364 <W ha_cmsd cms 16784:0 cmsd_service.c:297> CI_CMSERR_NOMEMB, client error: name gcd, id 16798 command 5 error 0x200c
Thu Dec 19 09:36:17.424 <N ha_cmsd cms 16784:0 cmsd_state.c:467> LAST MESSAGE IN THE cms SUBSYSTEM REPEATED 5 TIMES
Thu Dec 19 09:36:17.424 <N ha_cmsd cms 16784:0 cmsd_state.c:467> Could not contact all (2) nodes in 30000 msecs, procceding with 1 node(s) (0x1).
Thu Dec 19 09:36:17.424 <I0 ha_cmsd cms 16784:0 cmsd_memb.c:795> Proposed Membership: sqn 1 G_sqn = 1, ack false
node db1 [1] :  UP       incarnation 119          age 1:0
node db2 [2] :  DOWN     incarnation 0    age 0:0
Thu Dec 19 09:36:17.424 <I0 ha_cmsd cms 16784:0 cmsd_state.c:628> cmsd state change from deliver to monitor
Thu Dec 19 09:36:17.424 <N ha_cmsd cms 16784:0 cmsd_fstop.c:380> Need to reset db2:0, sending request.
Thu Dec 19 09:36:17.424 <I0 ha_cmsd cms 16784:0 cmsd_reset.c:201> Reset request for node db2 enqueued.
Thu Dec 19 09:36:17.424 <I0 ha_cmsd cms 16784:0 cmsd_reset.c:229> Reset request for node: db2 sent.
Thu Dec 19 09:36:17.424 <W ha_cmsd cms 16784:0 cmsd_service.c:297> CI_CMSERR_NOMEMB, client error: name gcd, id 16798 command 5 error 0x200c
Thu Dec 19 09:36:18.434 <I0 ha_cmsd cms 16784:0 cmsd_reset.c:299> Node db2 could NOT be RESET through local crsd, no owner. The node could still be reset through other cmsd/crsd.
Thu Dec 19 09:36:18.434 <W ha_cmsd cms 16784:0 cmsd_service.c:297> CI_CMSERR_NOMEMB, client error: name gcd, id 16798 command 5 error 0x200c
Thu Dec 19 09:36:31.564 <W ha_cmsd cms 16784:0 cmsd_service.c:297> LAST MESSAGE IN THE cms SUBSYSTEM REPEATED 12 TIMES
Thu Dec 19 09:36:31.564 <W ha_cmsd cms 16784:0 cmsd_service.c:297> CI_CMSERR_NOMEMB, client error: name cad, id 680 command 5 error 0x200c
Thu Dec 19 09:36:31.564 <W ha_cmsd cms 16784:0 cmsd_service.c:297> CI_CMSERR_NOMEMB, client error: name gcd, id 16798 command 5 error 0x200c
Thu Dec 19 09:36:51.764 <W ha_cmsd cms 16784:0 cmsd_service.c:297> LAST MESSAGE IN THE cms SUBSYSTEM REPEATED 19 TIMES
Thu Dec 19 09:36:51.764 <W ha_cmsd cms 16784:0 cmsd_service.c:297> CI_CMSERR_NOMEMB, client error: name cad, id 680 command 5 error 0x200c
Thu Dec 19 09:36:51.764 <W ha_cmsd cms 16784:0 cmsd_service.c:297> CI_CMSERR_NOMEMB, client error: name gcd, id 16798 command 5 error 0x200c
Thu Dec 19 09:37:11.964 <W ha_cmsd cms 16784:0 cmsd_service.c:297> LAST MESSAGE IN THE cms SUBSYSTEM REPEATED 19 TIMES
Thu Dec 19 09:37:11.964 <W ha_cmsd cms 16784:0 cmsd_service.c:297> CI_CMSERR_NOMEMB, client error: name cad, id 680 command 5 error 0x200c
Thu Dec 19 09:37:11.964 <W ha_cmsd cms 16784:0 cmsd_service.c:297> CI_CMSERR_NOMEMB, client error: name gcd, id 16798 command 5 error 0x200c
Thu Dec 19 09:37:18.024 <N ha_cmsd cms 16784:0 cmsd_fstop.c:199> LAST MESSAGE IN THE cms SUBSYSTEM REPEATED 5 TIMES
Thu Dec 19 09:37:18.024 <N ha_cmsd cms 16784:0 cmsd_fstop.c:199> Reset request for db2:0 has timed out.
Thu Dec 19 09:37:18.024 <W ha_cmsd cms 16784:0 cmsd_service.c:297> CI_CMSERR_NOMEMB, client error: name gcd, id 16798 command 5 error 0x200c
Thu Dec 19 09:37:20.044 <I0 ha_cmsd cms 16784:0 cmsd_memb.c:744> LAST MESSAGE IN THE cms SUBSYSTEM REPEATED ONCE
Thu Dec 19 09:37:20.044 <I0 ha_cmsd cms 16784:0 cmsd_memb.c:744> Node db2 has unknown status
Thu Dec 19 09:37:20.044 <N ha_cmsd cms 16784:0 cmsd_memb.c:795> Delivered Membership: sqn 1 G_sqn = 1, ack true
Tied membership could not be confirmed.
node db1 [1] :  UP       incarnation 119          age 1:0
node db2 [2] :  UNKNOWN  incarnation 0    age 0:0
Thu Dec 19 09:37:20.044 <W ha_cmsd cms 16784:0 cmsd_service.c:297> CI_CMSERR_LONELY, client error: name cad, id 680 command 5 error 0x200d
Thu Dec 19 09:37:20.044 <W ha_cmsd cms 16784:0 cmsd_service.c:297> CI_CMSERR_NOMEMB, client error: name gcd, id 16798 command 5 error 0x200c
Thu Dec 19 09:37:20.045 <W ha_cmsd cms 16784:0 cmsd_service.c:297> CI_CMSERR_LONELY, client error: name gcd, id 16798 command 5 error 0x200d
Thu Dec 19 09:37:20.064 <N ha_cmsd cms 16784:0 cmsd.c:851> Cmsd is out of membership, will restart after notifying clients.

Thu Dec 19 09:37:20.324 <I0 ha_cmsd cms 16784:0 cmsd_client.c:105> client un-registration: gcd, id 16798
Thu Dec 19 09:37:23.298 <E ha_cmsd crs 16784:0 cmsd_reset.c:159> CI_FAILURE, Crs_unregister failed.
Thu Dec 19 09:37:23.299 <E ha_cmsd misc 16784:0 ci_restart.c:208> CI_FAILURE, Exiting, monitoring agent should revive me.



###############################################################
## crsd_db1
###############################################################

Thu Dec 19 09:34:38.884 <I0 crsd crs 607:0 crsd_pending.c:841> Reset request from cmsd:1 (local) for db2:2.
Thu Dec 19 09:34:38.884 <W crsd crs 607:0 crsd_pending.c:851> CI_CRSERR_NOTFOUND, Reset request 0x8061ab8 received for node 2, but its owner node does not exist.
Thu Dec 19 09:36:17.924 <I0 crsd crs 607:0 crsd_pending.c:841> Reset request from cmsd:1 (local) for db2:2.
Thu Dec 19 09:36:17.924 <W crsd crs 607:0 crsd_pending.c:851> CI_CRSERR_NOTFOUND, Reset request 0x8066850 received for node 2, but its owner node does not exist.
Thu Dec 19 09:37:57.034 <I0 crsd crs 607:0 crsd_pending.c:841> Reset request from cmsd:1 (local) for db2:2.
Thu Dec 19 09:37:57.034 <W crsd crs 607:0 crsd_pending.c:851> CI_CRSERR_NOTFOUND, Reset request 0x8062d08 received for node 2, but its owner node does not exist.
Thu Dec 19 09:39:36.084 <I0 crsd crs 607:0 crsd_pending.c:841> Reset request from cmsd:1 (local) for db2:2.
Thu Dec 19 09:39:36.084 <W crsd crs 607:0 crsd_pending.c:851> CI_CRSERR_NOTFOUND, Reset request 0x8062dc0 received for node 2, but its owner node does not exist.
Thu Dec 19 09:41:15.124 <I0 crsd crs 607:0 crsd_pending.c:841> Reset request from cmsd:1 (local) for db2:2.
Thu Dec 19 09:41:15.124 <W crsd crs 607:0 crsd_pending.c:851> CI_CRSERR_NOTFOUND, Reset request 0x8066fb0 received for node 2, but its owner node does not exist.
Thu Dec 19 09:42:54.194 <I0 crsd crs 607:0 crsd_pending.c:841> Reset request from cmsd:1 (local) for db2:2.
Thu Dec 19 09:42:54.194 <W crsd crs 607:0 crsd_pending.c:851> CI_CRSERR_NOTFOUND, Reset request 0x80675c8 received for node 2, but its owner node does not exist.
Thu Dec 19 09:44:33.274 <I0 crsd crs 607:0 crsd_pending.c:841> Reset request from cmsd:1 (local) for db2:2.
Thu Dec 19 09:44:33.274 <W crsd crs 607:0 crsd_pending.c:851> CI_CRSERR_NOTFOUND, Reset request 0x8067640 received for node 2, but its owner node does not exist.
Thu Dec 19 09:51:20.795 <W crsd crs 607:0 crs_config.c:667> CI_ERR_NOTFOUND, SystemController information for node db1 not found, requests will be ignored.
Thu Dec 19 09:51:20.795 <W crsd crs 607:0 crs_config.c:667> CI_ERR_NOTFOUND, SystemController information for node db2 not found, requests will be ignored.


###############################################################
## gcd_db1
###############################################################

Thu Dec 19 09:34:16.024 <I0 ha_gcd gcd 15746:0 gcd_cms.c:209> Calling cms_new_info(), iter = 125.
Thu Dec 19 09:34:22.054 <I0 ha_gcd gcd 15746:0 gcd_cms.c:209> Calling cms_new_info(), iter = 126.
Thu Dec 19 09:34:28.084 <I0 ha_gcd gcd 15746:0 gcd_cms.c:209> Calling cms_new_info(), iter = 127.
Thu Dec 19 09:34:28.234 <I0 ha_gcd gcd 15746:0 gcd_cms.c:209> Calling cms_new_info(), iter = 128.
Thu Dec 19 09:34:34.264 <I0 ha_gcd gcd 15746:0 gcd_cms.c:209> Calling cms_new_info(), iter = 129.
Thu Dec 19 09:34:40.294 <I0 ha_gcd gcd 15746:0 gcd_cms.c:209> Calling cms_new_info(), iter = 130.
Thu Dec 19 09:34:46.324 <I0 ha_gcd gcd 15746:0 gcd_cms.c:209> Calling cms_new_info(), iter = 131.
Thu Dec 19 09:34:52.354 <I0 ha_gcd gcd 15746:0 gcd_cms.c:209> Calling cms_new_info(), iter = 132.
Thu Dec 19 09:34:58.384 <I0 ha_gcd gcd 15746:0 gcd_cms.c:209> Calling cms_new_info(), iter = 133.
Thu Dec 19 09:34:58.534 <I0 ha_gcd gcd 15746:0 gcd_cms.c:209> Calling cms_new_info(), iter = 134.
Thu Dec 19 09:35:04.564 <I0 ha_gcd gcd 15746:0 gcd_cms.c:209> Calling cms_new_info(), iter = 135.
Thu Dec 19 09:35:10.594 <I0 ha_gcd gcd 15746:0 gcd_cms.c:209> Calling cms_new_info(), iter = 136.
Thu Dec 19 09:35:16.624 <I0 ha_gcd gcd 15746:0 gcd_cms.c:209> Calling cms_new_info(), iter = 137.
Thu Dec 19 09:35:22.654 <I0 ha_gcd gcd 15746:0 gcd_cms.c:209> Calling cms_new_info(), iter = 138.
Thu Dec 19 09:35:28.684 <I0 ha_gcd gcd 15746:0 gcd_cms.c:209> Calling cms_new_info(), iter = 139.
Thu Dec 19 09:35:28.834 <I0 ha_gcd gcd 15746:0 gcd_cms.c:209> Calling cms_new_info(), iter = 140.
Thu Dec 19 09:35:34.864 <I0 ha_gcd gcd 15746:0 gcd_cms.c:209> Calling cms_new_info(), iter = 141.
Thu Dec 19 09:35:40.894 <I0 ha_gcd gcd 15746:0 gcd_cms.c:209> Calling cms_new_info(), iter = 142.
Thu Dec 19 09:35:40.954 <I0 ha_gcd gcd 15746:0 gcd_cms.c:209> Calling cms_new_info(), iter = 143.
Thu Dec 19 09:35:40.954 <E ha_gcd gcd 15746:0 gcd_cms.c:240> CI_CMSERR_LONELY, CMS is telling me that I am lonely.
Cleaning up and restarting.
Thu Dec 19 09:35:40.954 <E ha_gcd gcd 15746:0 gcd_loop.c:96> CI_CMSERR_LONELY, The CMS daemon has entered the lonely state.
Cleaning up and restarting. Bye for now!

                * * * L o g g i n g    R e s t a r t e d * * *

Thu Dec 19 09:35:43.411 <I0 ha_gcd gcd 16776:0 gcd_options.c:721> Value of gcd_incno = 213.
Thu Dec 19 09:35:43.413 <N ha_gcd gcd 16776:0 gcd_init.c:206> My node name = db1.
Thu Dec 19 09:35:44.204 <E ha_gcd gcd 16776:0 gcd_cms.c:140> CI_IPCERR_NOCONN, cms_register() failed.
Thu Dec 19 09:35:44.204 <E ha_gcd gcd 16776:0 gcd_init.c:237> CI_IPCERR_NOCONN, No IPC connection to CMSD. Cleaning up and restarting. Bye for now!

                * * * L o g g i n g    R e s t a r t e d * * *

Thu Dec 19 09:35:52.503 <I0 ha_gcd gcd 16798:0 gcd_options.c:721> Value of gcd_incno = 214.
Thu Dec 19 09:35:52.505 <N ha_gcd gcd 16798:0 gcd_init.c:206> My node name = db1.
Thu Dec 19 09:35:54.204 <N ha_gcd gcd 16798:0 gcd_cms.c:164> My nodeid = 1 [0x1].
Thu Dec 19 09:35:54.204 <I0 ha_gcd gcd 16798:0 gcd_cms.c:209> Calling cms_new_info(), iter = 0.
Thu Dec 19 09:37:20.064 <E ha_gcd gcd 16798:0 gcd_cms.c:240> CI_CMSERR_LONELY, CMS is telling me that I am lonely.
Cleaning up and restarting.
Thu Dec 19 09:37:20.064 <E ha_gcd gcd 16798:0 gcd_cms.c:190> CI_CMSERR_LONELY, get_cms_membership() failed.
Thu Dec 19 09:37:20.064 <E ha_gcd gcd 16798:0 gcd_init.c:231> CI_CMSERR_LONELY, The CMS daemon has entered the lonely state.
Cleaning up and restarting. Bye for now!


###############################################################
## srmd_db1
###############################################################

Thu Dec 19 09:25:49.604 <I0 ha_srmd srm 15754:2 sc_reply.c:134> Poll (responses without requests) request reply done
Thu Dec 19 09:35:50.154 <W ha_srmd gcs 15698:0 gcc.c:668> LAST MESSAGE IN THE gcs SUBSYSTEM REPEATED 80 TIMES
Thu Dec 19 09:35:50.154 <W ha_srmd gcs 15698:0 gcc.c:668> CI_IPCERR_NOCONN, gcs_pulse(): Pulse for instid = 0x10001 failed with error = CI_IPCERR_NOCONN
Thu Dec 19 09:35:50.154 <N ha_srmd srm 15698:0 sr_main.c:165> LAST MESSAGE IN THE srm SUBSYSTEM REPEATED ONCE
Thu Dec 19 09:35:50.154 <E ha_srmd srm 15698:0 sr_main.c:165> CI_IPCERR_NOCONN, SRM gcs pulse failed
Thu Dec 19 09:35:50.154 <E ha_srmd srm 15698:0 srm_main.c:269> CI_IPCERR_NOCONN, Main thread exited
Thu Dec 19 09:35:50.157 <W ha_srmd gcs 15698:0 gcc.c:450> CI_IPCERR_NOCONN, gcs_unregister(): ipcclnt_send failed
Thu Dec 19 09:35:50.157 <E ha_srmd srm 15698:0 srm_gcs.c:172> CI_IPCERR_NOCONN, gcs_unregister failed

                * * * L o g g i n g    R e s t a r t e d * * *

Thu Dec 19 09:35:59.278 <I0 ha_srmd srm 16809:0 srm_main.c:145> ha_srmd is running as foreground process
Thu Dec 19 09:35:59.291 <W ha_srmd srm 16809:0 srm_config.c:1058> CI_CONFERR_NOTFOUND, Could not read local SRM parameters
Thu Dec 19 09:35:59.436 <W ha_srmd srm 16809:0 srm_config.c:1272> local resource type definitions not present
Thu Dec 19 09:36:19.445 <W ha_srmd ipc 16809:0 ipc_clnt.c:295> CI_IPCERR_NOSERVER, Connection file /var/run/failsafe/comm/gcs_db1 not present.
Thu Dec 19 09:36:19.445 <W ha_srmd gcs 16809:0 gcc.c:203> CI_IPCERR_NOSERVER, gcs_register(): ipcclnt_connect failed
Thu Dec 19 09:37:19.035 <W ha_srmd srm 16809:0 srm_gcs.c:111> CI_IPCERR_NOSERVER, Waited 1 minutes for GCD (ha_gcd) process to start
Thu Dec 19 09:38:19.635 <W ha_srmd srm 16809:0 srm_gcs.c:111> CI_IPCERR_NOSERVER, Waited 2 minutes for GCD (ha_gcd) process to start
Thu Dec 19 09:40:20.835 <W ha_srmd srm 16809:0 srm_gcs.c:111> CI_IPCERR_NOSERVER, Waited 4 minutes for GCD (ha_gcd) process to start
Thu Dec 19 09:44:23.235 <W ha_srmd srm 16809:0 srm_gcs.c:111> CI_IPCERR_NOSERVER, Waited 8 minutes for GCD (ha_gcd) process to start
Thu Dec 19 09:45:58.684 <W ha_srmd gcs 16809:0 gcc.c:309> LAST MESSAGE IN THE gcs SUBSYSTEM REPEATED 571 TIMES
Thu Dec 19 09:45:58.684 <W ha_srmd gcs 16809:0 gcc.c:309> gcs_register(): registration failed
Thu Dec 19 09:46:00.714 <W ha_srmd gcs 16809:0 gcc.c:309> gcs_register(): registration failed


~
~
~
~


---------MDLINK-NGMime-5773-1040302312.744128---------
1!