[Linux-HA] pengine process killed by signal 11 (SIGEGV)

Daniel van Ham Colchete daniel.colchete at gmail.com
Sat Dec 9 06:49:41 MST 2006


Hi,

I'm trying to setup an 2-node Heartbeat 2.0 system here. I'm using
version 2.0.7 on a Gentoo system with kernel 2.6.18.

When I start one of the nodes (mail0) first and them the second,
everything works greatly. My problem is that when I start both at the
same time, nothing works.

Doing some digging, I found that pengine is having some sort of
segmentation fault (signal 11).

First, the important logs:

Dec  9 11:43:05 www0 crmd: [24314]: info: crm_timer_popped:utils.c
Election Trigger (I_DC_TIMEOUT) just popped!
Dec  9 11:43:05 www0 crmd: [24314]: info: update_dc:utils.c Set DC to
<null> (<null>)
Dec  9 11:43:05 www0 crmd: [24314]: info: start_subsystem:subsystems.c
Starting sub-system "pengine"
Dec  9 11:43:05 www0 crmd: [24314]: info: do_dc_takeover:election.c
Taking over DC status for this partition
Dec  9 11:43:05 www0 cib: [13106]: info:
cib_process_readwrite:messages.c We are now in R/W mode
Dec  9 11:43:05 www0 pengine: [24321]: info: init_start:main.c Starting pengine
Dec  9 11:43:05 www0 crmd: [24314]: info: update_dc:utils.c Set DC to
www0 (1.0.6)
Dec  9 11:43:06 www0 crmd: [24314]: info: do_state_transition:fsa.c
All 2 cluster nodes responded to the join offer.
Dec  9 11:43:06 www0 cib: [13106]: info: sync_our_cib:messages.c
Syncing CIB to all peers
Dec  9 11:43:06 www0 crmd: [24314]: info: update_dc:utils.c Set DC to
www0 (1.0.6)
Dec  9 11:43:07 www0 crmd: [24314]: info: do_state_transition:fsa.c
www0: State transition S_FINALIZE_JOIN -> S_POLICY_ENGINE [
input=I_FINALIZED cause=C_FSA
_INTERNAL origin=check_join_state ]
Dec  9 11:43:07 www0 crmd: [24314]: info: do_state_transition:fsa.c
All 2 cluster nodes are eligable to run resources.
Dec  9 11:43:07 www0 crmd: [24314]: info:
crmd_ipc_msg_callback:callbacks.c pengine: no message this time
Dec  9 11:43:07 www0 crmd: [24314]: info:
process_client_disconnect:utils.c Received HUP from pengine:[24321]
Dec  9 11:43:07 www0 crmd: [24314]: WARN: Exiting pengine process
24321 killed by signal 11.
Dec  9 11:43:07 www0 crmd: [24314]: info:
crmdManagedChildDied:subsystems.c Process pengine:[24321] exited
(signal=11, exitcode=0)
Dec  9 11:43:07 www0 crmd: [24314]: ERROR:
crmdManagedChildDied:subsystems.c The pengine subsystem terminated
unexpectedly
Dec  9 11:43:07 www0 crmd: [24314]: ERROR: do_log:misc.c [[FSA]] Input
I_ERROR from crmdManagedChildDied() received in state
(S_TRANSITION_ENGINE)
Dec  9 11:43:07 www0 crmd: [24314]: info: do_dc_release:election.c DC
role released

And it repeats indefinitely.

You can acess my cib.xml at http://pastebin.ca/272979.

Thanks for any help.

Best regards,
Daniel Colchete


More information about the Linux-HA mailing list