[Linux-HA] How stonith works
fengyandong
fengyandong at nrchpc.ac.cn
Fri Oct 16 19:35:13 MDT 2009
Any help,
Thanks.
On Fri, Oct 16, 2009 at 9:21 AM, fengyandong <fengyandong at nrchpc.ac.cn>wrote:
> Before pulling out the power cables on the mds26, crm_mon -1 display:
> ============
> Last updated: Fri Oct 16 09:00:33 2009
> Current DC: mds26 (81a224e3-6d9f-4cd1-adf3-99876541089f)
> 2 Nodes configured.
> 3 Resources configured.
> ============
>
> Node: mds25 (9e935fdb-ceda-479d-bfdb-11b94db72895): online
> Node: mds26 (81a224e3-6d9f-4cd1-adf3-99876541089f): online
>
> Resource Group: group_1
> bwfsmmd_1_9af183f9-369e-41be-8bfc-7a45886e7aed
> (ocf::heartbeat:bwfsmmd): Started mds26
> bwfs_2_fs (ocf::heartbeat:bwfs): Started mds26
> IPaddr_10_10_170_35 (ocf::heartbeat:IPaddr): Started mds26
> stonith_5_mds25 (stonith:external/ipmi): Started mds26
> stonith_6_mds26 (stonith:external/ipmi): Started mds25
>
> After 2 minutes, push back the power cables on the mds26, crm_mon -1
> display:
> ============
> Last updated: Fri Oct 16 09:11:35 2009
> Current DC: mds25 (9e935fdb-ceda-479d-bfdb-11b94db72895)
> 2 Nodes configured.
> 3 Resources configured.
> ============
>
> Node: mds25 (9e935fdb-ceda-479d-bfdb-11b94db72895): online
> Node: mds26 (81a224e3-6d9f-4cd1-adf3-99876541089f): online
>
> Resource Group: group_1
> bwfsmmd_1_9af183f9-369e-41be-8bfc-7a45886e7aed
> (ocf::heartbeat:bwfsmmd): Started mds26
> bwfs_2_fs (ocf::heartbeat:bwfs): Started mds26
> IPaddr_10_10_170_35 (ocf::heartbeat:IPaddr): Started mds26
> stonith_5_mds25 (stonith:external/ipmi): Started mds26
>
> Failed actions:
> stonith_6_mds26_start_0 (node=mds25, call=10, rc=1): complete
>
> You could see that group_1 is still running on the mds26.
>
> The attachment contains cib.xml and ha-log from the two nodes.
> ha-log.mds25 is from mds25 and ha-log.mds26 is from mds26.
>
>
> On Thu, Oct 15, 2009 at 11:18 PM, fengyandong <fengyandong at nrchpc.ac.cn>wrote:
>
>> Thanks.
>> I will do it tomorrow morning.
>>
>>
>> On Thu, Oct 15, 2009 at 10:49 PM, Dejan Muhamedagic <dejanmm at fastmail.fm>wrote:
>>
>>> Hi,
>>>
>>> On Thu, Oct 15, 2009 at 10:08:46PM +0800, fengyandong wrote:
>>> > I enable CRM.
>>>
>>> Good.
>>>
>>> > I see the information from the ha-log that the passive node will
>>> attempt to
>>> > stonith the active node, but it will ask other node to stonith the
>>> active
>>> > node if it fails to stonith the active node within 2 minutes.
>>>
>>> It doesn't ask the other node specifically, but broadcasts the
>>> fencing request in hope that another node from the cluster may
>>> stonith the offending node.
>>>
>>> > When the power
>>> > cables of the active node pushed back and the active node starts
>>> > successfully, the stonith service on the passive node is stopped and
>>> never
>>> > stonithes the passive node.
>>>
>>> That's strange. Can't recall seeing that. If it happens, then
>>> please file a bugzilla and attach hb_report.
>>>
>>> Thanks,
>>>
>>> Dejan
>>>
>>> > On Thu, Oct 15, 2009 at 9:42 PM, Alex Dean <adean at meteostar.com>
>>> wrote:
>>> >
>>> > >
>>> > > On Oct 15, 2009, at 8:29 AM, fengyandong wrote:
>>> > >
>>> > > > Yes, if the power of the node is pulled out, the stonith device
>>> > > > can not
>>> > > > work.
>>> > >
>>> > > I'm using iLO2 as my stonith device, which relies on the power of the
>>> > > host machine. When pulling the power cables of the primary node,
>>> I've
>>> > > observed that the secondary node will attempt to stonith the primary
>>> > > node. This fails because iLO2 also loses power when the host's power
>>> > > is cut. The secondary node will retry indefinitely until stonith
>>> > > succeeds, and will not become primary until then. When I restore
>>> > > power to the old primary, it is reset by the secondary, and the
>>> > > secondary then proceeds to start resources.
>>> > >
>>> > > alex
>>> > > _______________________________________________
>>> > > Linux-HA mailing list
>>> > > Linux-HA at lists.linux-ha.org
>>> > > http://lists.linux-ha.org/mailman/listinfo/linux-ha
>>> > > See also: http://linux-ha.org/ReportingProblems
>>> > >
>>> > _______________________________________________
>>> > Linux-HA mailing list
>>> > Linux-HA at lists.linux-ha.org
>>> > http://lists.linux-ha.org/mailman/listinfo/linux-ha
>>> > See also: http://linux-ha.org/ReportingProblems
>>> _______________________________________________
>>> Linux-HA mailing list
>>> Linux-HA at lists.linux-ha.org
>>> http://lists.linux-ha.org/mailman/listinfo/linux-ha
>>> See also: http://linux-ha.org/ReportingProblems
>>>
>>
>>
>
More information about the Linux-HA
mailing list