[Linux-HA] issues with stonithd and apcmastersnmp
dejanmm at fastmail.fm
Fri Mar 5 05:07:22 MST 2010
On Fri, Mar 05, 2010 at 10:38:24AM +0100, Andreas Kurz wrote:
> On Thursday 04 March 2010 20:00:58 Brian Wolfe wrote:
> > I replaced the strncmp() calls in the stonithd.c function for matching
> > the node_name to the device hosts controlled list with the
> > case-insensitive version strncasecmp() and it's working like a champ
> > now.
> don't forget to post the patch, thx
That's already been done in December and should be available in
1.0.7. See http://developerbugs.linux-foundation.org/show_bug.cgi?id=2292
> > Are the node names case sensitive or insensitive? If they are
> > insensitive then it might be a good idea to do all node name
> > comparisons with the strncasecmp() call instead just to thwart any
> > future cse issues. :)
I doubt that's going to happen for anything else apart from when
getting input from outside, such as with stonith devices.
Heartbeat explicitly converts the node name to lowercase.
Corosync seems to care only about IP addresses/node ids, so it
shouldn't matter there. But it could matter in Pacemaker. At any
rate, just keep your hostnames lowercase.
> maybe someone volunteers for that? ;-)
> > On Thu, Mar 4, 2010 at 4:02 AM, Andreas Kurz <andreas.kurz at linbit.com>
> > > On Wednesday 03 March 2010 20:40:18 Brian Wolfe wrote:
> > >> I have a cluster setup with 2 dell servers, dual ethernet heartbeats,
> > >> and a single 8-port APCMaster PDU switch. The cluster works except
> > >> for one issue. The cloned stonithd interface refuses to make a call to
> > >> the apcmaster to power down the node that's "dead". Reading through
> > >> the logs I can see that during setup the stonithd asks the
> > >> apcmastersnmp module to check it's hosts list and it returns the
> > >> correct hostnames "tpc-dal-prlores3 tpc-dal-tcfs2". However when the
> > >> time comes for it to actually use the device I get the following
> > >> message from stonithd refusing to actually kill the other node.
> > >
> > > hmm .... the outlet names of the PDU are also uppercase?
> > >
> > > Regards,
> > > Andreas
> > >
> > >> Mar 3 13:00:36 TPC-DAL-TCFS2 crmd: : info: te_fence_node:
> > >> Executing poweroff fencing operation (24) on TPC-DAL-PRLORES3
> > >> (timeout=60000)
> > >> Mar 3 13:00:36 TPC-DAL-TCFS2 crmd: : debug: waiting for the
> > >> stonith reply msg.
> > >> Mar 3 13:00:36 TPC-DAL-TCFS2 stonithd: : info: client tengine
> > >> [pid: 15805] requests a STONITH operation POWEROFF on node
> > >> TPC-DAL-PRLORES3
> > >> Mar 3 13:00:36 TPC-DAL-TCFS2 stonithd: : info: we can't manage
> > >> TPC-DAL-PRLORES3, broadcast request to other nodes
> > >> Mar 3 13:00:36 TPC-DAL-TCFS2 stonithd: : debug: inserted
> > >> optype=POWEROFF, key=-2
> > >> Mar 3 13:00:36 TPC-DAL-TCFS2 stonithd: : info: Broadcasting
> > >> the message succeeded: require others to stonith node
> > >> TPC-DAL-PRLORES3.
> > >> Mar 3 13:00:36 TPC-DAL-TCFS2 stonithd: : debug:
> > >> stonithd_node_fence: sent back a synchronous reply.
> > >> Mar 3 13:00:36 TPC-DAL-TCFS2 crmd: : debug:
> > >> stonithd_node_fence:574: stonithd's synchronous answer is ST_APIOK
> > >>
> > >>
> > >> The stonith is configured as follows:
> > >>
> > >> <clone id="fencing" >
> > >> <primitive class="stonith" id="apcstonith23"
> > >> type="apcmastersnmp" > <operations id="apcstonith23-operations" >
> > >> <op id="apcstonith23-op-monitor-15" interval="15"
> > >> name="monitor" start-delay="15" timeout="15" />
> > >> </operations>
> > >> <instance_attributes id="apcstonith23-instance_attributes" >
> > >> <nvpair id="nvpair-604e339f-a400-4b30-82c0-f046de0ed663"
> > >> name="ipaddr" value="172.20.1.23" />
> > >> <nvpair id="nvpair-ed611421-97a1-4091-a5cd-8159f1230096" name="port"
> > >> value="161" />
> > >> <nvpair id="nvpair-997431e2-ea78-4065-b835-f9149bbcb596"
> > >> name="community" value="private" />
> > >> </instance_attributes>
> > >> </primitive>
> > >> <meta_attributes id="fencing-meta_attributes" >
> > >> </meta_attributes>
> > >> </clone>
> > >>
> > >>
> > >> I can confirm the use of the stonith via the command "stonith -t
> > >> apcmastersnmp <params> tpc-dal-prlores3" and it'll switch off the
> > >> server.
> > >>
> > >> Any help would be appreciated.
> > >> _______________________________________________
> > >> Linux-HA mailing list
> > >> Linux-HA at lists.linux-ha.org
> > >> http://lists.linux-ha.org/mailman/listinfo/linux-ha
> > >> See also: http://linux-ha.org/ReportingProblems
> Linux-HA mailing list
> Linux-HA at lists.linux-ha.org
> See also: http://linux-ha.org/ReportingProblems
More information about the Linux-HA