[Linux-HA] Split-Brain activity in 2 node Cluster
smsvvarez at aol.com
smsvvarez at aol.com
Wed Aug 17 14:19:09 MDT 2005
Skipped content of type multipart/alternative-------------- next part --------------
============
Current DC: alioth (aa643165-91b2-4f22-acb6-04f723b17f38)
2 Nodes configured.
1 Resources configured.
============
Node: alioth (aa643165-91b2-4f22-acb6-04f723b17f38): online
Node: alioth-bak (d4d6078b-9485-4e22-bd89-0193b72fc45a): OFFLINE
IPaddr_1 (heartbeat::heartbeat:IPaddr): alioth (aa643165-91b2-4f22-acb6-04f723b17f38)
-------------- next part --------------
============
Current DC: alioth-bak (d4d6078b-9485-4e22-bd89-0193b72fc45a)
2 Nodes configured.
1 Resources configured.
============
Node: alioth (aa643165-91b2-4f22-acb6-04f723b17f38): OFFLINE
Node: alioth-bak (d4d6078b-9485-4e22-bd89-0193b72fc45a): online
IPaddr_1 (heartbeat::heartbeat:IPaddr): alioth-bak (d4d6078b-9485-4e22-bd89-0193b72fc45a)
-------------- next part --------------
# Use the new resource manager with v2.0.0
crm yes
# Setup logfile stuff
logfacility daemon
logfile /var/log/ha-log
debugfile /var/log/ha-debug
# List our cluster members
node alioth
node alioth-bak
# Send heartbeat over interface eth1(cross-over cable) every second
# and announce nodes dead after 10 seconds.
bcast eth1
keepalive 1
deadtime 10
# Active/Active -> on | yes | true | 1
# Active/Passive -> off | no | false | 0
auto_failback off
-------------- next part --------------
heartbeat[10445]: 2005/08/16_09:35:24 WARN: Logging daemon is disabled --enabling logging daemon is recommended
heartbeat[10445]: 2005/08/16_09:35:24 info: **************************
heartbeat[10445]: 2005/08/16_09:35:24 info: Configuration validated. Starting heartbeat 2.0.0
heartbeat[10446]: 2005/08/16_09:35:24 info: heartbeat: version 2.0.0
heartbeat[10446]: 2005/08/16_09:35:24 info: Heartbeat generation: 82
heartbeat[10446]: 2005/08/16_09:35:24 info: Removing /var/lib/heartbeat/rsctmp failed, recreating.
heartbeat[10446]: 2005/08/16_09:35:24 info: glib: UDP Broadcast heartbeat started on port 694 (694) interface eth1
heartbeat[10446]: 2005/08/16_09:35:24 info: G_main_add_SignalHandler: Added signal handler for signal 17
heartbeat[10446]: 2005/08/16_09:35:24 info: pid 10446 locked in memory.
heartbeat[10446]: 2005/08/16_09:35:24 info: Local status now set to: 'up'
heartbeat[10449]: 2005/08/16_09:35:25 info: pid 10449 locked in memory.
heartbeat[10450]: 2005/08/16_09:35:25 info: pid 10450 locked in memory.
heartbeat[10451]: 2005/08/16_09:35:25 info: pid 10451 locked in memory.
heartbeat[10446]: 2005/08/16_09:35:44 WARN: node alioth-bak: is dead
heartbeat[10446]: 2005/08/16_09:35:44 info: Local status now set to: 'active'
heartbeat[10446]: 2005/08/16_09:35:44 info: Starting child client "/usr/lib/heartbeat/ccm" (500,500)
heartbeat[10446]: 2005/08/16_09:35:44 info: Starting child client "/usr/lib/heartbeat/cib" (500,500)
heartbeat[10446]: 2005/08/16_09:35:44 info: Starting child client "/usr/lib/heartbeat/stonithd" (0,0)
heartbeat[10446]: 2005/08/16_09:35:44 info: Starting child client "/usr/lib/heartbeat/lrmd" (0,0)
heartbeat[10446]: 2005/08/16_09:35:44 info: Starting child client "/usr/lib/heartbeat/crmd" (500,500)
heartbeat[10456]: 2005/08/16_09:35:44 info: Starting "/usr/lib/heartbeat/ccm" as uid 500 gid 500 (pid 10456)
heartbeat[10457]: 2005/08/16_09:35:44 info: Starting "/usr/lib/heartbeat/cib" as uid 500 gid 500 (pid 10457)
cib[10457]: 2005/08/16_09:35:44 info: G_main_add_SignalHandler: Added signal handler for signal 15
heartbeat[10458]: 2005/08/16_09:35:44 info: Starting "/usr/lib/heartbeat/stonithd" as uid 0 gid 0 (pid 10458)
cib[10457]: 2005/08/16_09:35:44 info: mask(main.c:cib_register_ha): Signing in with Heartbeat
heartbeat[10459]: 2005/08/16_09:35:44 info: Starting "/usr/lib/heartbeat/lrmd" as uid 0 gid 0 (pid 10459)
cib[10457]: 2005/08/16_09:35:44 info: mask(main.c:cib_register_ha): FSA Hostname: alioth
cib[10457]: 2005/08/16_09:35:44 WARN: mask(io.c:initializeCib): Option suppress_cib_writes not set
cib[10457]: 2005/08/16_09:35:44 info: mask(main.c:startCib): CIB Initialization completed successfully
cib[10457]: 2005/08/16_09:35:44 info: mask(main.c:init_start): Starting cib mainloop
stonithd[10458]: 2005/08/16_09:35:44 info: G_main_add_SignalHandler: Added signal handler for signal 10
stonithd[10458]: 2005/08/16_09:35:44 info: G_main_add_SignalHandler: Added signal handler for signal 12
stonithd[10458]: 2005/08/16_09:35:44 info: pid 10458 locked in memory.
heartbeat[10460]: 2005/08/16_09:35:44 info: Starting "/usr/lib/heartbeat/crmd" as uid 500 gid 500 (pid 10460)
crmd[10460]: 2005/08/16_09:35:44 info: mask(main.c:init_start): Starting crmd
lrmd[10459]: 2005/08/16_09:35:44 info: G_main_add_SignalHandler: Added signal handler for signal 15
stonithd[10458]: 2005/08/16_09:35:44 info: Signing in with heartbeat.
lrmd[10459]: 2005/08/16_09:35:44 info: G_main_add_SignalHandler: Added signal handler for signal 17
stonithd[10458]: 2005/08/16_09:35:44 notice: /usr/lib/heartbeat/stonithd start up successfully.
stonithd[10458]: 2005/08/16_09:35:44 info: G_main_add_SignalHandler: Added signal handler for signal 17
lrmd[10459]: 2005/08/16_09:35:44 info: G_main_add_SignalHandler: Added signal handler for signal 10
crmd[10460]: 2005/08/16_09:35:44 info: mask(control.c:register_with_ha): FSA Hostname: alioth
crmd[10460]: 2005/08/16_09:35:44 info: mask(control.c:do_startup): Register Signal Handler
crmd[10460]: 2005/08/16_09:35:44 info: G_main_add_SignalHandler: Added signal handler for signal 15
crmd[10460]: 2005/08/16_09:35:44 info: G_main_add_TriggerHandler: Added signal manual handler
crmd[10460]: 2005/08/16_09:35:44 info: mask(control.c:do_startup): Init server comms
crmd[10460]: 2005/08/16_09:35:44 info: mask(control.c:do_startup): Creating CIB object
crmd[10460]: 2005/08/16_09:35:44 info: G_main_add_SignalHandler: Added signal handler for signal 17
crmd[10460]: 2005/08/16_09:35:44 info: mask(cib_native.c:cib_native_signon): Connection to CIB successful
lrmd[10459]: 2005/08/16_09:35:44 info: G_main_add_SignalHandler: Added signal handler for signal 12
lrmd[10459]: 2005/08/16_09:35:44 info: Started.
crmd[10460]: 2005/08/16_09:35:44 info: mask(ccm.c:do_ccm_control): CCM Activation passed... all set to go!
crmd[10460]: 2005/08/16_09:35:44 info: mask(control.c:do_started): Delaying start, CCM (0000000000100000) not connected
crmd[10460]: 2005/08/16_09:35:44 info: mask(main.c:init_start): Starting crmd's mainloop
crmd[10460]: 2005/08/16_09:35:44 info: mask(control.c:do_started): Delaying start, CCM (0000000000100000) not connected
crmd[10460]: 2005/08/16_09:35:45 info: mem_handle_event: Got an event OC_EV_MS_NEW_MEMBERSHIP from ccm
crmd[10460]: 2005/08/16_09:35:45 info: mem_handle_event: instance=1, nodes=1, new=1, lost=0, n_idx=0, new_idx=0, old_idx=3
crmd[10460]: 2005/08/16_09:35:45 info: mask(callbacks.c:crmd_ccm_msg_callback): Quorum (re)attained after event=NEW MEMBERSHIP (id=1)
crmd[10460]: 2005/08/16_09:35:45 info: mask(ccm.c:ccm_event_detail): NEW MEMBERSHIP: trans=1, nodes=1, new=1, lost=0 n_idx=0, new_idx=0, old_idx=3
crmd[10460]: 2005/08/16_09:35:45 info: mask(ccm.c:ccm_event_detail): NEW: alioth [nodeid=0, born=1]
crmd[10460]: 2005/08/16_09:35:45 info: mask(control.c:do_started): The local CRM is operational
crmd[10460]: 2005/08/16_09:35:45 info: mask(fsa.c:do_state_transition): State transition S_STARTING -> S_PENDING [ input=I_PENDING cause=C_CCM_CALLBACK origin=do_started ]
cib[10457]: 2005/08/16_09:35:45 info: mem_handle_event: Got an event OC_EV_MS_NEW_MEMBERSHIP from ccm
cib[10457]: 2005/08/16_09:35:45 info: mem_handle_event: instance=1, nodes=1, new=1, lost=0, n_idx=0, new_idx=0, old_idx=3
cib[10457]: 2005/08/16_09:35:45 info: mask(callbacks.c:cib_ccm_msg_callback): Process CCM event=NEW MEMBERSHIP (id=1)
cib[10457]: 2005/08/16_09:35:45 info: mask(callbacks.c:cib_ccm_msg_callback): Quorum (re)attained after event=NEW MEMBERSHIP (id=1)
crmd[10460]: 2005/08/16_09:35:57 info: mask(utils.c:crm_timer_popped): Election Trigger (I_DC_TIMEOUT) just popped!
crmd[10460]: 2005/08/16_09:35:57 WARN: mask(misc.c:do_log): [[FSA]] Input I_DC_TIMEOUT from crm_timer_popped() received in state (S_PENDING)
crmd[10460]: 2005/08/16_09:35:57 info: mask(fsa.c:do_state_transition): State transition S_PENDING -> S_ELECTION [ input=I_DC_TIMEOUT cause=C_TIMER_POPPED origin=crm_timer_popped ]
crmd[10460]: 2005/08/16_09:36:03 info: mask(utils.c:crm_timer_popped): Election Timeout (I_ELECTION_DC) just popped!
crmd[10460]: 2005/08/16_09:36:03 info: mask(fsa.c:do_state_transition): State transition S_ELECTION -> S_INTEGRATION [ input=I_ELECTION_DC cause=C_TIMER_POPPED origin=crm_timer_popped ]
crmd[10460]: 2005/08/16_09:36:03 info: mask(subsystems.c:start_subsystem): Starting sub-system "tengine"
tengine[10463]: 2005/08/16_09:36:03 info: G_main_add_SignalHandler: Added signal handler for signal 15
crmd[10460]: 2005/08/16_09:36:03 info: mask(subsystems.c:start_subsystem): Starting sub-system "pengine"
pengine[10464]: 2005/08/16_09:36:03 info: G_main_add_SignalHandler: Added signal handler for signal 15
pengine[10464]: 2005/08/16_09:36:03 info: mask(main.c:init_start): Starting pengine
crmd[10460]: 2005/08/16_09:36:03 info: mask(election.c:do_dc_takeover): Taking over DC status for this partition
crmd[10460]: 2005/08/16_09:36:03 info: mask(join_dc.c:do_dc_join_offer_all): 0) Offering membership to 1 clients
crmd[10460]: 2005/08/16_09:36:03 notice: mask(callbacks.c:crmd_client_status_callback): Status update: Client alioth/crmd now has status [online]
cib[10457]: 2005/08/16_09:36:03 info: mask(messages.c:cib_process_readwrite): We are now in R/W mode
cib[10457]: 2005/08/16_09:36:03 WARN: mask(io.c:initializeCib): Option suppress_cib_writes not set
cib[10457]: 2005/08/16_09:36:03 WARN: mask(io.c:initializeCib): Option suppress_cib_writes not set
crmd[10460]: 2005/08/16_09:36:03 info: mask(fsa.c:do_state_transition): State transition S_INTEGRATION -> S_FINALIZE_JOIN [ input=I_INTEGRATED cause=C_FSA_INTERNAL origin=check_join_state ]
crmd[10460]: 2005/08/16_09:36:03 info: mask(fsa.c:do_state_transition): All 1 cluster nodes responded to the join offer.
tengine[10463]: 2005/08/16_09:36:03 info: mask(cib_native.c:cib_native_signon): Connection to CIB successful
cib[10457]: 2005/08/16_09:36:03 WARN: mask(io.c:initializeCib): Option suppress_cib_writes not set
crmd[10460]: 2005/08/16_09:36:03 info: mask(join_dc.c:process_join_ack_msg): 4) Updating node state to member for alioth
cib[10457]: 2005/08/16_09:36:03 WARN: mask(io.c:initializeCib): Option suppress_cib_writes not set
cib[10457]: 2005/08/16_09:36:03 WARN: mask(io.c:initializeCib): Option suppress_cib_writes not set
tengine[10463]: 2005/08/16_09:36:03 info: mask(main.c:init_start): Starting tengine
tengine[10463]: 2005/08/16_09:36:03 info: mask(tengine.c:initialize_graph): Registering TE UUID: 9eb28e0e-f86b-44f4-83ae-9bdf6c07610b
cib[10457]: 2005/08/16_09:36:03 WARN: mask(io.c:initializeCib): Option suppress_cib_writes not set
heartbeat[10446]: 2005/08/16_09:36:03 WARN: Performed 1 more non-realtime malloc calls.
cib[10457]: 2005/08/16_09:36:03 info: mask(callbacks.c:cib_null_callback): Setting cib_diff_notify callbacks for tengine: on
crmd[10460]: 2005/08/16_09:36:03 info: mask(fsa.c:do_state_transition): State transition S_FINALIZE_JOIN -> S_POLICY_ENGINE [ input=I_FINALIZED cause=C_FSA_INTERNAL origin=check_join_state ]
heartbeat[10446]: 2005/08/16_09:36:03 info: Total non-realtime malloc bytes: 135168
crmd[10460]: 2005/08/16_09:36:03 info: mask(fsa.c:do_state_transition): All 1 cluster nodes are eligable to run resources.
tengine[10463]: 2005/08/16_09:36:03 info: mask(utils.c:send_complete): 1 - Transition status: Stopped: te_abort_confirmed
pengine[10464]: 2005/08/16_09:36:03 info: mask(process_pe_message): [generation] <cib generated="true" admin_epoch="0" epoch="13" num_updates="89" have_quorum="true" num_peers="1" origin="alioth" cib_feature_revision="1" last_written="Tue Aug 16 09:36:03 2005" dc_uuid="aa643165-91b2-4f22-acb6-04f723b17f38" debug_source="finalize_join" ccm_transition="1"/>
pengine[10464]: 2005/08/16_09:36:03 WARN: mask(unpack.c:param_value): Option default_resource_stickiness not set
pengine[10464]: 2005/08/16_09:36:03 WARN: mask(unpack.c:param_value): Option stonith_enabled not set
pengine[10464]: 2005/08/16_09:36:03 info: mask(unpack.c:unpack_config): STONITH of failed nodes is disabled
pengine[10464]: 2005/08/16_09:36:03 info: mask(unpack.c:unpack_config): Cluster is symmetric - resources can run anywhere by default
pengine[10464]: 2005/08/16_09:36:03 info: mask(unpack.c:unpack_config): On loss of CCM Quorum: Stop ALL resources
pengine[10464]: 2005/08/16_09:36:03 info: mask(native.c:native_create_actions): Start resource IPaddr_1 (alioth)
pengine[10464]: 2005/08/16_09:36:03 info: mask(stages.c:stage8): Creating transition graph 0.
crmd[10460]: 2005/08/16_09:36:03 info: mask(fsa.c:do_state_transition): State transition S_POLICY_ENGINE -> S_TRANSITION_ENGINE [ input=I_PE_SUCCESS cause=C_IPC_MESSAGE origin=do_msg_route ]
tengine[10463]: 2005/08/16_09:36:03 info: mask(unpack.c:unpack_graph): Beginning transition 0 : timeout set to 120000ms
tengine[10463]: 2005/08/16_09:36:03 info: mask(unpack.c:unpack_graph): Unpacked 1 actions in 1 synapses
tengine[10463]: 2005/08/16_09:36:03 info: mask(tengine.c:initiate_transition): Initating transition
cib[10457]: 2005/08/16_09:36:03 WARN: mask(io.c:initializeCib): Option suppress_cib_writes not set
tengine[10463]: 2005/08/16_09:36:03 info: mask(tengine.c:cib_action_updated): Initiating action 1: start IPaddr_1 on alioth
crmd[10460]: 2005/08/16_09:36:03 WARN: lrm_get_rsc(653): got a return code HA_FAIL from a reply message of getrsc with function get_ret_from_msg.
crmd[10460]: 2005/08/16_09:36:03 WARN: lrm_get_rsc(653): got a return code HA_FAIL from a reply message of getrsc with function get_ret_from_msg.
crmd[10460]: 2005/08/16_09:36:03 info: mask(lrm.c:do_lrm_rsc_op): Performing op start on IPaddr_1
IPaddr[10465]: 2005/08/16_09:36:03 info: /sbin/ifconfig eth0:0 192.168.0.200 netmask 255.255.255.0 broadcast 192.168.0.2
IPaddr[10465]: 2005/08/16_09:36:03 info: Sending Gratuitous Arp for 192.168.0.200 on eth0:0 [eth0]
IPaddr[10465]: 2005/08/16_09:36:03 /usr/lib/heartbeat/send_arp -i 500 -r 10 -p /var/lib/heartbeat/rsctmp/send_arp/send_arp-192.168.0.200 eth0 192.168.0.200 auto 192.168.0.200 ffffffffffff
cib[10457]: 2005/08/16_09:36:03 WARN: mask(io.c:initializeCib): Option suppress_cib_writes not set
tengine[10463]: 2005/08/16_09:36:03 info: mask(tengine.c:match_graph_event): Action 1 confirmed
tengine[10463]: 2005/08/16_09:36:03 info: mask(tengine.c:check_for_completion): Transition complete
tengine[10463]: 2005/08/16_09:36:03 info: mask(utils.c:send_complete): 0 - Transition status: Complete: complete
crmd[10460]: 2005/08/16_09:36:03 info: mask(fsa.c:do_state_transition): State transition S_TRANSITION_ENGINE -> S_IDLE [ input=I_TE_SUCCESS cause=C_IPC_MESSAGE origin=do_msg_route ]
cib[10457]: 2005/08/16_09:36:41 info: mask(callbacks.c:cib_process_disconnect): Cleaning up after cib_ro channel disconnect from client (0x83e6bc8) 170326a0-4618-421d-abbf-a4acb8beec2/<null>
cib[10457]: 2005/08/16_09:36:56 info: mask(callbacks.c:cib_process_disconnect): Cleaning up after cib_ro channel disconnect from client (0x83e6bc8) 434f0398-7bb2-4c7a-8a5d-ffaa0879d54/<null>
cib[10457]: 2005/08/16_09:37:11 info: mask(callbacks.c:cib_process_disconnect): Cleaning up after cib_ro channel disconnect from client (0x83e6bc8) 155f5202-115b-4090-b810-b87f2c86d50/<null>
cib[10457]: 2005/08/16_09:37:26 info: mask(callbacks.c:cib_process_disconnect): Cleaning up after cib_ro channel disconnect from client (0x83e6bc8) 71041f48-cbbb-4115-b2b2-7b84aecdc43/<null>
cib[10457]: 2005/08/16_09:37:41 info: mask(callbacks.c:cib_process_disconnect): Cleaning up after cib_ro channel disconnect from client (0x83e6bc8) ec145ce5-0daa-44f8-a796-03b49a6ca63/<null>
cib[10457]: 2005/08/16_09:37:57 info: mask(callbacks.c:cib_process_disconnect): Cleaning up after cib_ro channel disconnect from client (0x83e6bc8) 607ff9d4-76d2-415d-87aa-a68fec5bb87/<null>
cib[10457]: 2005/08/16_09:38:12 info: mask(callbacks.c:cib_process_disconnect): Cleaning up after cib_ro channel disconnect from client (0x83e6bc8) f635ddab-6d5f-4f98-8c13-fc829de13d2/<null>
cib[10457]: 2005/08/16_09:38:27 info: mask(callbacks.c:cib_process_disconnect): Cleaning up after cib_ro channel disconnect from client (0x83e6bc8) 1e6ff306-e68d-4bea-9437-6849484030e/<null>
cib[10457]: 2005/08/16_09:38:42 info: mask(callbacks.c:cib_process_disconnect): Cleaning up after cib_ro channel disconnect from client (0x83e6bc8) f4016890-8a87-4c60-81d2-e7f07110dcc/<null>
cib[10457]: 2005/08/16_09:38:57 info: mask(callbacks.c:cib_process_disconnect): Cleaning up after cib_ro channel disconnect from client (0x83e6bc8) d3858166-1409-4b4f-8e69-a2463663682/<null>
cib[10457]: 2005/08/16_09:39:12 info: mask(callbacks.c:cib_process_disconnect): Cleaning up after cib_ro channel disconnect from client (0x83e6bc8) e624c132-ca2a-41d6-a3c4-7a6bf8a30a5/<null>
cib[10457]: 2005/08/16_09:39:27 info: mask(callbacks.c:cib_process_disconnect): Cleaning up after cib_ro channel disconnect from client (0x83e6bc8) 77067735-b39c-4f7a-acb3-f822aa07baf/<null>
cib[10457]: 2005/08/16_09:39:42 info: mask(callbacks.c:cib_process_disconnect): Cleaning up after cib_ro channel disconnect from client (0x83e6bc8) 4ddac646-b5e5-40e0-ba16-eb62e3980ea/<null>
cib[10457]: 2005/08/16_09:39:57 info: mask(callbacks.c:cib_process_disconnect): Cleaning up after cib_ro channel disconnect from client (0x83e6bc8) 7258ed1d-f97d-4f63-9a98-2d903eac952/<null>
cib[10457]: 2005/08/16_09:40:12 info: mask(callbacks.c:cib_process_disconnect): Cleaning up after cib_ro channel disconnect from client (0x83e6bc8) e7fdea99-b02c-4e40-bcda-c268fe43aec/<null>
...
heartbeat[10446]: 2005/08/16_09:47:03 info: killing /usr/lib/heartbeat/crmd process group 10460 with signal 15
crmd[10460]: 2005/08/16_09:47:03 info: mask(fsa.c:do_state_transition): State transition S_IDLE -> S_RELEASE_DC [ input=I_SHUTDOWN cause=C_SHUTDOWN origin=crm_shutdown ]
crmd[10460]: 2005/08/16_09:47:03 WARN: mask(election.c:do_election_vote): Not voting in election, we're shutting down
crmd[10460]: 2005/08/16_09:47:03 info: mask(subsystems.c:stop_subsystem): Sending quit message to pengine.
cib[10457]: 2005/08/16_09:47:03 info: mask(messages.c:cib_process_readwrite): We are now in R/O mode
crmd[10460]: 2005/08/16_09:47:03 info: mask(subsystems.c:stop_subsystem): Sending quit message to tengine.
crmd[10460]: 2005/08/16_09:47:03 info: mask(fsa.c:do_state_transition): State transition S_RELEASE_DC -> S_PENDING [ input=I_RELEASE_SUCCESS cause=C_FSA_INTERNAL origin=do_dc_release ]
tengine[10463]: 2005/08/16_09:47:03 info: mask(callbacks.c:process_te_message): Received quit message, terminating
cib[10457]: 2005/08/16_09:47:03 info: mask(callbacks.c:cib_process_disconnect): Cleaning up after cib_callback channel disconnect from client (0x83e08d0) df240d90-0b1b-4b66-81af-195fea7a998/tengine
...
More information about the Linux-HA
mailing list