[Linux-HA] Quorum problem with 3-node cluster
mark at netexpo.nl
Fri Oct 2 01:34:18 MDT 2009
Mark Hunting wrote:
> Dejan Muhamedagic wrote:
>> On Thu, Oct 01, 2009 at 04:45:45PM +0200, Mark Hunting wrote:
>>> Sorry for not mentioning I use Heartbeat 2.1.3 from the Debian Lenny
>>> repository, in crm configuration.
>>> Mark Hunting wrote:
>>>> I have set up a 3-node cluster. Works perfectly, but when I shut one
>>>> node down the other two lose quorum, and shut down their resources (!)
>>>> because no-quorum-policy is set to 'stop' like it should.
>>>> I have no idea why the quorum is lost, this really should not happen as
>>>> the remaining two nodes are still the majority. crm_mon shows them
>>>> online and they can talk to each other. Only the quorum is lost,
>>>> have_quorum is "false" until the third node comes up again.
>>>> Can anybody tell me how this is possible, or give me some command that
>>>> can help me investigate this?
>> ccm_tool (or similar, can't recall the name exactly) can show you
>> what a node thinks its partition looks like. Otherwise, look at
>> the ccm lines in the logs, though they may be really hard to
>> figure out.
> Thanks a lot! It just came to my mind that I changed the three node
> names today in ha.cf, and this problem started to occur afterwards. I
> think the cluster still remembers the three old names next to the new
> ones. I guess it now 'thinks' it has six nodes instead of three, and
> that may be an explanation for this behaviour I'm seeing (although then
> with 3 of the 6 nodes online it also shouldn't get a quorum imo, but it
> does). crm_admin shows only 3 nodes however, that's a bit strange. I
> can't access the cluster right now, but I'll try to figure out more
> tomorrow. There should be a way to force the removal of the old node
> names (ideas anyone?)
I know a bit more now. The cluster thinks it has 4 nodes instead of 3. I
see this in my logs:
ccm: : debug: total_node_count=4, total_quorum_votes=400
But there are really only 3 nodes. Crmadmin, ccm_tool and the xml output
from cibadmin all only show my existing 3 nodes. So I have no idea
where this total_node_count of 4 comes from. How can I let Heartbeat
stop thinking it has 4 nodes?
More information about the Linux-HA