[Linux-HA] Fencing prevents resource from failing over
Dejan Muhamedagic
dejanmm at fastmail.fm
Mon Nov 26 10:50:55 MST 2007
Hi,
On Mon, Nov 26, 2007 at 04:14:07PM +0100, Andrew Beekhof wrote:
>
> On Nov 26, 2007, at 2:38 PM, <abhishek.bagchi at wipro.com>
> <abhishek.bagchi at wipro.com > wrote:
>
> >
> >Hi Andrew,
> >I just modified my stonith device to work in both online and offline
> >mode. The stonith operation (standby -> active) is successful with the
> >active node cable unplugged and it seems the standby node tries to
> >start
> >the resource, but fails. Log is attached. But there's not enough
> >logs to
> >find out whats going on. It just prints:
> >
> >pengine[15900]: 2007/11/26_18:11:27 WARN: unpack_rsc_op: Processing
> >failed op (Proxy_10_114_31_238_start_0) for Proxy_10_114_31_238 on
> >standby
> >pengine[15900]: 2007/11/26_18:11:27 WARN: unpack_rsc_op: Handling
> >failed
> >start for Proxy_10_114_31_238 on standby
> >
> >Is there a way to enable more log messages in HA at run-time? The
> >debug
> >log and regular log seem to have the same amount of messages.
>
> add this to ha.cf
>
> debug 1
>
I think that sending some signals should also help:
- USR1 to increase the debug level
- USR2 the opposite
Thanks,
Dejan
> beyond that i can't help much as i've not had much to do with stonith
>
> >
> >
> >Thanks again,
> >Abhi
> >
> >-----Original Message-----
> >From: linux-ha-bounces at lists.linux-ha.org
> >[mailto:linux-ha-bounces at lists.linux-ha.org] On Behalf Of Andrew
> >Beekhof
> >Sent: Monday, November 26, 2007 2:59 PM
> >To: General Linux-HA mailing list
> >Subject: Re: [Linux-HA] Fencing prevents resource from failing over
> >
> >
> >On Nov 26, 2007, at 9:56 AM, <abhishek.bagchi at wipro.com>
> ><abhishek.bagchi at wipro.com > wrote:
> >
> >>
> >>Thanks Andrew,
> >>My comments are inline...
> >>-----Original Message-----
> >>From: linux-ha-bounces at lists.linux-ha.org
> >>[mailto:linux-ha-bounces at lists.linux-ha.org] On Behalf Of Andrew
> >>Beekhof
> >>Sent: Monday, November 26, 2007 1:44 PM
> >>To: General Linux-HA mailing list
> >>Subject: Re: [Linux-HA] Fencing prevents resource from failing over
> >>
> >>
> >>On Nov 26, 2007, at 6:25 AM, <abhishek.bagchi at wipro.com>
> >><abhishek.bagchi at wipro.com > wrote:
> >>
> >>>
> >>>Hi,
> >>>I've a 2 node active/passive cluster ( active node=>active , passive
> >>>node=>standby) using heartbeat 2.0.8 . I recently enabled stonith .
> >>>The
> >>>stonith device is an rsh device that tries to restart the cluster
> >>>node.
> >>>However, something that used to work with stonith disabled has
> >>>stopped
> >>
> >>>working now ; Node failover on network cable disconnection. I
> >>>believe
> >
> >>>since the stonith device uses the network, the stonith fails and
> >>>hence
> >>
> >>>the resource is left wherever it was running.
> >>
> >>correct. the cluster will not start anything until it can verify the
> >>node is truly dead (with a successful stonith operation) this is
> >>how a
> >
> >>stonith enabled cluster is supposed to work and is why IP-based
> >>stonith modules are not a great idea.
> >>
> >>
> >>
> >>>Can anyone please help resolve this problem (this is probably not a
> >>>problem and this is how stonith is expected to work )? I would like
> >>>to
> >>
> >>>know if there's anyway to tell the passive (currently active node)
> >>>to
> >
> >>>give up trying to stonith and then start the resource.
> >>
> >>by design - no.
> >>
> >>>I've attached my
> >>>cib file and logs from the passive when cable is disconnected.
> >>>I've no problem both nodes running the resource as active is anyway
> >>>cut-off from network and can't do any damage.
> >>
> >>if thats truly the case, then you may not need stonith.
> >>
> >>ABHI: But, if the Active comes online again it's a very bad thing for
> >>both nodes to be running the resources.
> >
> >the crm will detect that and stop one of them.
> >however there will always be a period of time (even with your proposal
> >below) where they are both active and both connected to the network
> >
> >>Can we configure two stonith
> >>devices and make the node think stonith is successful if either of
> >>the
> >
> >>stonith operations return success.Is their some kind of resource
> >>constraint that I can use in this case ?
> >>1. Online stonith device: That uses IP to reset the other node.
> >>2. Offline stonith device: That is just dummy and on reset always
> >>returns success.
> >
> >if you're lucky, this might work 9 times out of 10.
> >but its likely that when it doesn't work, that its going to _really_
> >hurt you.
> >
> >"tricking" the cluster almost always leads to pain.
> >
> >
> >my advice... get a real stonith device...
> >
> >>>The standby log seems to
> >>>say it has quorum
> >>
> >>2-node clusters always have quorum, so the value is meaningless...
> >>
> >>>but it makes me wonder why it doesnt start the resources , inspite
> >>>of
> >
> >>>the following evident from the logs.
> >>>
> >>>1. Standby marks active unclean
> >>>2. Standby has quorum
> >>>3. Standby tries to move resources back to standby
> >>>
> >>>
> >>>Thanks in advance,
> >>>Abhi.
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>The information contained in this electronic message and any
> >>>attachments to this message are intended for the exclusive use of
> >>>the
> >>>addressee(s) and may contain proprietary, confidential or privileged
> >>>information. If you are not the intended recipient, you should not
> >>>disseminate, distribute or copy this e-mail. Please notify the
> >>>sender
> >
> >>>immediately and destroy all copies of this message and any
> >>>attachments.
> >>>
> >>>WARNING: Computer viruses can be transmitted via email. The
> >>>recipient
> >
> >>>should check this email and any attachments for the presence of
> >>>viruses. The company accepts no liability for any damage caused by
> >>>any
> >>
> >>>virus transmitted by this email.
> >>>
> >>>www.wipro.com<ha-log-
> >>>standby.txt><cib.xml>_______________________________________________
> >>>Linux-HA mailing list
> >>>Linux-HA at lists.linux-ha.org
> >>>http://lists.linux-ha.org/mailman/listinfo/linux-ha
> >>>See also: http://linux-ha.org/ReportingProblems
> >>
> >>_______________________________________________
> >>Linux-HA mailing list
> >>Linux-HA at lists.linux-ha.org
> >>http://lists.linux-ha.org/mailman/listinfo/linux-ha
> >>See also: http://linux-ha.org/ReportingProblems
> >>
> >>
> >>The information contained in this electronic message and any
> >>attachments to this message are intended for the exclusive use of the
> >>addressee(s) and may contain proprietary, confidential or privileged
> >>information. If you are not the intended recipient, you should not
> >>disseminate, distribute or copy this e-mail. Please notify the sender
> >>immediately and destroy all copies of this message and any
> >>attachments.
> >>
> >>WARNING: Computer viruses can be transmitted via email. The recipient
> >>should check this email and any attachments for the presence of
> >>viruses. The company accepts no liability for any damage caused by
> >>any
> >
> >>virus transmitted by this email.
> >>
> >>www.wipro.com
> >>_______________________________________________
> >>Linux-HA mailing list
> >>Linux-HA at lists.linux-ha.org
> >>http://lists.linux-ha.org/mailman/listinfo/linux-ha
> >>See also: http://linux-ha.org/ReportingProblems
> >
> >_______________________________________________
> >Linux-HA mailing list
> >Linux-HA at lists.linux-ha.org
> >http://lists.linux-ha.org/mailman/listinfo/linux-ha
> >See also: http://linux-ha.org/ReportingProblems
> >
> >
> >The information contained in this electronic message and any
> >attachments to this message are intended for the exclusive use of
> >the addressee(s) and may contain proprietary, confidential or
> >privileged information. If you are not the intended recipient, you
> >should not disseminate, distribute or copy this e-mail. Please
> >notify the sender immediately and destroy all copies of this message
> >and any attachments. WARNING: Computer viruses can be transmitted
> >via email. The recipient should check this email and any attachments
> >for the presence of viruses. The company accepts no liability for
> >any damage caused by any virus transmitted by this email. www.wipro.com
> ><standby-log.txt>
>
> _______________________________________________
> Linux-HA mailing list
> Linux-HA at lists.linux-ha.org
> http://lists.linux-ha.org/mailman/listinfo/linux-ha
> See also: http://linux-ha.org/ReportingProblems
More information about the Linux-HA
mailing list