[Linux-HA] Failover not working as I expected
Jerome Yanga
jyanga at esri.com
Tue Jan 20 13:48:13 MST 2009
Dominik,
Per your request, attached is my current configuration.
To reiterate, the following are still concerns:
01) Resources gets bounced when Nomen rejoins the cluster.
02) Group failover will not work as hoped.
As per resource monitoring, I believe that the customized init scripts are working properly; however, me being a noob seems to contradict this. I have tested the init scripts in a way that when a failure of the resource is experienced the service is restarted. After seeing that the init script is working, I have set the "On Fail" value to "stop" instead of "restart".
Moreover, I have tried varying the group scores by changing the resource_stickiness and the resource_failure_stickiness values. However, I have not been able to consistently failover the group by stopping one of the resources. During the testing, I have tried using the equation below from the site you provided in your previous email.
node = (constraint-score) + (num_group_resources * resource_stickiness) + (failcount * (resource_failure_stickiness) )
Unfortunately, the scores does not seem to follow this equation as I would verify them using the showscores.sh. The following values were assign to the Directory_Server group during this testing.
resource_stickiness=100
resource_failure_stickiness=-500
I have also attempted to use the crm_failcount command to make sure that the scores prior to failing any resource gets reset, but showscores.sh seems to show that the command is not working.
I have also tried to change the cib.xml file manually to assign the values above to default-resource-stickiness and default-resource-failure-stickiness respectively, but after doing so, all the resources seems to disappear. (Good thing I had created a copy of the cib.xml file.)
By the way, I have changed the values back to the following:
resource_stickiness=100
resource_failure_stickiness=-100
Help.
Regards,
Jerome
-----Original Message-----
From: linux-ha-bounces at lists.linux-ha.org [mailto:linux-ha-bounces at lists.linux-ha.org] On Behalf Of Dominik Klein
Sent: Monday, January 19, 2009 11:31 PM
To: General Linux-HA mailing list
Subject: Re: [Linux-HA] Failover not working as I expected
Jerome Yanga wrote:
> Dominik,
>
> Thank you much. Adding "resource-stickiness" and getting rid of the constraint helped a lot. The resources does not go back to Nomen anymore when it's heartbeat is started again (resources stays with Rubric). However, the resources still gets bounced once Nomen joins the cluster. Is there any way to keep the resources from bouncing when Nomen rejoins the cluster?
Please share your current configuration.
> I have also observed another issue. As you have seen in my cib.xml, I have created a group called Directory_Server. In this group, there are three resources, namely: VIP, ECAS and FDS_Admin. If I manually turn off any of these resources, I would like the group resource, Directory_Server, to failover to the other node. Is there a configuration that will do this? Currently, if one of three resources goes down it stays down and the rest continues running. All three resources will need to be up and running for our applications to work properly.
Sounds like you're not doing any resource monitoring. Read up on that
and configure it. The ScoreCalculation page might be handy to understand
how things work: http://www.linux-ha.org/ScoreCalculation
Regards
Dominik
_______________________________________________
Linux-HA mailing list
Linux-HA at lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems
-------------- next part --------------
A non-text attachment was scrubbed...
Name: my_config.zip
Type: application/x-zip-compressed
Size: 1472 bytes
Desc: my_config.zip
Url : http://lists.community.tummy.com/pipermail/linux-ha/attachments/20090120/994f33dd/my_config.bin
More information about the Linux-HA
mailing list