[Linux-ha-dev] Re: confused about the bully election algorithm implementation in crmd

home_king home_king at 163.com
Thu Jan 4 19:30:16 MST 2007


Hi, Andrew.

Nice patch. :)

However, there exits other problems.

1. do_election_check() will compare the size of 'voted' hashtbl with the 
size
of fsa_membership_copy->members hashtbl. However, if 'voted' contains some
obsolete items (NOVOTEs with old election_id), they will be taken into
account in error.

That is, once launching a new election, your patch can refuse old NOVOTE but
have no chance to purge the old NOVOTE recorded in 'voted' hashtbl before.

Given there is 3 nodes: A, B, C.
A_bornon > B_bornon > C_bornon
B launchs the first election, A sends NOVOTE to B.
C launchs the second election, both A & B make C master.
Then, C is down, B launchs the third election, it immediately becomes 
master,
no matter whether A votes for it or not! Because the 'voted' hashtbl of B
contains the old NOVOTE which comes from A!

2. CIB will not work under some special scenario.
What happens when the election is processing, the admin launchs some CIB
operation, for example, add_src? At this time, no CIB master is non-exist,
this operation will never be handled & replied! The admin such as mgmtd,
will be block forever.
This problem is the same with that when DC exits, or when the JOIN protocol
processing (the full-sync not be launched yet).
CIB may provide a "lock" mechanism to its users, with which crmd can freeze
the CIB & keep data safe when crmd is in unstable states.

3. The wrong use of "stall" of fsa will cause deadlock
R_CCM_DATA must be set before the fsa come into PENDING state, or 
do_started()
will stall the fsa, which prepends itself in the fsa queue. Once the fsa is
stalled, other fsa_jobs cannot be processed. However, setting the flag of
R_CCM_DATA just happens in a fsa_job -- do_ccm_update_cache(), which is
registered by the I_CCM_EVENT handler -- crmd_ccm_msg_callback():
        register_fsa_input_adv(
            C_CCM_CALLBACK, I_CCM_EVENT, event_data,
            trigger_transition?A_TE_CANCEL:A_NOTHING,
            FALSE, __FUNCTION__);

You see, do_ccm_update_cache() has no chance to run because fsa is stalled,
meanwhile, the stall fn do_started() waits the result of 
do_ccm_update_cache().
Here is the deadlock.

BTW, I have some questions about some code implementation:

1. Why not use libxml2 & glib n-ray tree to construct the internal XML
representation?
libxml2 can use to retrieve the skeleton from xml file, and then we can
convert this base into an glib n-ray tree, whose nodes are our internal
structure. When sending the xml data, we just traverse this tree into a
ha_msg structure; When writing the xml file, we can just use libxml2
directly.

2. Can we slim the election & join protocol; Can we slim the state machine?
I always found they are complex & hard to understand and the code is huge.
Maybe the full design & implentation of crmd family is meaningful to the
linux-ha fans or even the developers. Thanks. :)




More information about the Linux-HA-Dev mailing list