[Linux-HA] Failover not working as I expected

Jerome Yanga jyanga at esri.com
Fri Jan 16 11:05:49 MST 2009


Dominik,

Thank you much.   Adding "resource-stickiness" and getting rid of the constraint helped a lot.  The resources does not go back to Nomen anymore when it's heartbeat is started again  (resources stays with Rubric).  However, the resources still gets bounced once Nomen joins the cluster.  Is there any way to keep the resources from bouncing when Nomen rejoins the cluster?

I have also observed another issue.  As you have seen in my cib.xml, I have created a group called Directory_Server.  In this group, there are three resources, namely:  VIP, ECAS and FDS_Admin.  If I manually turn off any of these resources, I would like the group resource, Directory_Server, to failover to the other node.  Is there a configuration that will do this?  Currently, if one of three resources goes down it stays down and the rest continues running.  All three resources will need to be up and running for our applications to work properly.

To answer your question...

"Also due to your rsc_location. The resource is where you configured it
(on nomen), so why move it around?"

I added rsc_location in the configuration as I was trying to follow the sample ActivePassive configuration.

http://linux-ha.org/GettingStartedV2/OneIPAddress

I have been moving resources around because I am testing HA thoroughly before I implement it in our production environment.

Regards,
Jerome



-----Original Message-----
From: linux-ha-bounces at lists.linux-ha.org [mailto:linux-ha-bounces at lists.linux-ha.org] On Behalf Of Dominik Klein
Sent: Thursday, January 15, 2009 11:16 PM
To: General Linux-HA mailing list
Subject: Re: [Linux-HA] Failover not working as I expected

Hi Jerome

> The name of the servers are as follows:  Nomen and Rubric.  
> 
> Let us start when Nomen owns all resources and its status states "running(dc)".  When I stop heartbeat on Nomen, Rubric takes over all the resources and its status turns into "running(dc)".  This is good as this is what I had hoped that it will do.
> 
> When I start heartbeat back on Nomen, it takes all the resource away from Rubric.  However, it leaves Rubric in "running(dc)" status and Nomen's status just states "running".  There are two issues here that I see.
> 
> 1)  I do not want Nomen to take the resources as this means that the resources will be bounced.

This happens because of your rsc_location constraint. You normally want
your resource to be on "nomen", so if the cluster can, it will run it there.

<rsc_location id="fdstest" rsc="Directory_Server">
<rule id="prefered_fdstest" score="100" boolean_op="or">
<expression attribute="#uname" id="9e5698e0-8b07-43aa-b852-398fbe6bb909"
operation="eq" value="nomen.esri.com"/>
</rule>
</rsc_location>

If you want the resource to stick to its current location even when the
preferred node comes back, look into the meta-attribute
"resource-stickiness". Read http://www.linux-ha.org/ScoreCalculation

> 2)  I would like to have the Quorum or "running(dc)" where the resources are.

You can't move the dc role manually. And you do not have to bother which
machine is the dc. It is totally fine having resources on a node which
is not the dc.

The current dc stays dc until it is shutdown or separated from the
cluster in some manner.

> To continue, when I stop heartbeat on Rubric, the "running(dc)" status goes over to Nomen. I then start heartbeat in Rubric and all resources as well as the "running(dc)" stays with Nomen.  Moreover, the resources are not bounced at all.

Also due to your rsc_location. The resource is where you configured it
(on nomen), so why move it around?

Regards
Dominik
_______________________________________________
Linux-HA mailing list
Linux-HA at lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems



More information about the Linux-HA mailing list