[Linux-HA] rackpdu stonith bug

Dejan Muhamedagic dejanmm at fastmail.fm
Fri Jun 4 05:22:31 MDT 2010


On Fri, Jun 04, 2010 at 07:14:12AM -0400, Vadym Chepkov wrote:
> I actually submitted a patch to linux-ha-dev as described 

Either I missed it, or it never got there. Perhaps you're not
subscribed to the list? At any rate, the patch is already in the
repository.

Cheers,

Dejan

> On Jun 4, 2010, at 7:04 AM, Dejan Muhamedagic wrote:
> 
> > Hi,
> > 
> > On Thu, Jun 03, 2010 at 06:52:09PM -0400, Vadym Chepkov wrote:
> >> Hi
> >> 
> >> There is a bug in stonith/plugins/external/rackpdu in cluster-glue-1.0.5
> >> 
> >> It doesn't check if snmpset was successful or not :
> >> 
> >> SendCommand() {
> >> 
> >>    local host=$1
> >>    local command=$2
> >> 
> >>    GetOutletNumber $host
> >>    local outlet=$?
> >> 
> >>    if [ $outlet -gt 0 ]; then
> >>        local set_result=`snmpset -v1 -c $community $pduip $oid.$outlet i $command 2>&1`
> >>        local check_result=`echo "$set_result" | grep "Timeout"`            
> >> 
> >>        if [ ! -z "$check_result" ]; then
> >>            ha_log.sh err "Write SNMP value $oid.$outlet=$command. Result: $set_result"
> >>        fi
> >> 
> >>        return 0
> >>    else
> >>        return 1
> >>    fi
> >> }
> >> 
> >> Here is what happens:
> >> 
> >> + '[' 1 -gt 0 ']'
> >> ++ snmpset -v1 -c private 10.10.10.10  .1.3.6.1.4.1.318.1.1.12.3.3.1.1.4.1 i 2
> >> + local 'set_result=Error in packet.
> >> Reason: (genError) A general failure occured'
> >> ++ echo 'Error in packet.
> >> Reason: (genError) A general failure occured'
> >> ++ grep Timeout
> >> + local check_result=
> >> + '[' '!' -z '' ']'
> >> + return 0
> >> + exit 0
> >> 
> >> so stonith agent says it was successful when it was not :(
> >> 
> >> instead of grepping for "Timeout" (why?)
> > 
> > Don't know. Yes, that's strange. I left that check in anyway. Can
> > you simulate a time out and see what does snmpset return (exit
> > code)?
> > 
> >> it should check if exit status was 0, then it was successful
> >> 2 - failed and not recoverable 
> > 
> > Yes, fixed now. Also snmpwalk for gethosts.
> > 
> >> 1 - you can possibly retry. 
> >> 
> >> The last one, unfortunately, usually happens when somebody is
> >> already logged in into PDU (via http or telnet)
> > 
> > Well, we could retry, but that's probably going to be in vain.
> > That needs to be documented.
> > 
> > Can you please test the changes. You can pull the new version
> > from the repository for testing.
> > 
> > Many thanks for the report.
> > 
> > Dejan
> 
> 
> I actually submitted a patch to linux-ha-dev list as described on clusterlabs site, I guess it never got it there. 
> I attach it now. I assume the original author didn't realize 
> 
> local result=`command` 
> 
> always returns 0, no matter what command outcome was. timeout does generate exit code 1
> 
> 

Delivered-To: vchepkov at gmail.com
Received: by 10.150.211.9 with SMTP id j9cs68199ybg;
        Fri, 4 Jun 2010 04:09:14 -0700 (PDT)
Received: by 10.150.65.10 with SMTP id n10mr10901081yba.9.1275649754581;
        Fri, 04 Jun 2010 04:09:14 -0700 (PDT)
Return-Path: <vchepkov at gmail.com>
Received: from vms173011.mailsrvcs.net (vms173011pub.verizon.net [206.46.173.11])
        by mx.google.com with ESMTP id v23si5591487ybv.60.2010.06.04.04.09.14;
        Fri, 04 Jun 2010 04:09:14 -0700 (PDT)
Received-SPF: neutral (google.com: 206.46.173.11 is neither permitted nor denied by domain of vchepkov at gmail.com) client-ip=206.46.173.11;
Authentication-Results: mx.google.com; spf=neutral (google.com: 206.46.173.11 is neither permitted nor denied by domain of vchepkov at gmail.com) smtp.mail=vchepkov at gmail.com
Received: from fedora.chepkov.lan ([unknown] [173.71.210.176])
 by vms173011.mailsrvcs.net
 (Sun Java(tm) System Messaging Server 7u2-7.02 32bit (built Apr 16 2009))
 with ESMTPA id <0L3H005YXLNEEF60 at vms173011.mailsrvcs.net> for
 vchepkov at gmail.com; Fri, 04 Jun 2010 06:09:14 -0500 (CDT)
Received: from centos64-dev.chepkov.lan
 (centos64-dev.chepkov.lan [10.10.10.92])	by fedora.chepkov.lan (8.14.4/8.14.4)
 with ESMTP id o54B9BUL023880; Fri, 04 Jun 2010 07:09:11 -0400
Content-type: text/plain; charset=us-ascii
MIME-version: 1.0
Content-transfer-encoding: 7bit
Subject: [PATCH] Check exit codes of snmp utils
X-Mercurial-Node: 955b957b9e64c83cff9a0e793922143f573cc712
Message-id: <955b957b9e64c83cff9a.1275649752 at centos64-dev.chepkov.lan>
User-Agent: Mercurial-patchbomb/1.5.1
Date: Fri, 04 Jun 2010 07:09:12 -0400
From: Vadym Chepkov <vchepkov at gmail.com>
To: vchepkov at gmail.com
> 
> # HG changeset patch
> # User Vadym Chepkov <vchepkov at gmail.com>
> # Date 1275609966 14400
> # Node ID 955b957b9e64c83cff9a0e793922143f573cc712
> # Parent  5385c0d6c83668cd970161b2862282570b3cf92a
> Check exit codes of snmp utils
> 
> diff -r 5385c0d6c836 -r 955b957b9e64 lib/plugins/stonith/external/rackpdu
> --- a/lib/plugins/stonith/external/rackpdu	Tue May 25 15:35:38 2010 +0200
> +++ b/lib/plugins/stonith/external/rackpdu	Thu Jun 03 20:06:06 2010 -0400
> @@ -68,7 +68,12 @@
>  	# Get outlet number from device
>      
>  	local outlet_num=1
> -	local snmp_result=`snmpwalk -v1 -c $community $pduip $names_oid 2>&1`
> +	local snmp_result
> +	snmp_result=`snmpwalk -v1 -c $community $pduip $names_oid 2>&1`
> +        if [ $? -ne 0 ]; then
> +	    ha_log.sh err "Outlet number not found for node $nodename. Result: $snmp_result"
> +	    return 0
> +	fi
>  
>  	local names=`echo "$snmp_result" | cut -f2 -d'"' | tr ' ' '_' | tr '\012' ' '`
>  
> @@ -95,11 +100,11 @@
>      local outlet=$?
>  
>      if [ $outlet -gt 0 ]; then
> -        local set_result=`snmpset -v1 -c $community $pduip $oid.$outlet i $command 2>&1`
> -        local check_result=`echo "$set_result" | grep "Timeout"`	    
> -
> -        if [ ! -z "$check_result" ]; then
> +        local set_result
> +        set_result=`snmpset -v1 -c $community $pduip $oid.$outlet i $command 2>&1`
> +        if [ $? -ne 0 ]; then
>      	    ha_log.sh err "Write SNMP value $oid.$outlet=$command. Result: $set_result"
> +            return 1
>  	fi
>  	    
>  	return 0
> @@ -116,9 +121,7 @@
>  gethosts)
>  	if [ "$hostlist" = "AUTO" ]; then
>  	    snmp_result=`snmpwalk -v1 -c $community $pduip $names_oid 2>&1`
> -	    snmp_check=`echo "$snmp_result" | grep "Timeout"`
> -
> -	    if [ ! -z "$snmp_check" ]; then
> +	    if [ $? -ne 0 ]; then
>  		ha_log.sh err "Cannot read list of nodes from device. Result: $snmp_result"
>  		exit 1
>  	    else

> 
> 
> 
> Vadym
> 
> 

> _______________________________________________
> Linux-HA mailing list
> Linux-HA at lists.linux-ha.org
> http://lists.linux-ha.org/mailman/listinfo/linux-ha
> See also: http://linux-ha.org/ReportingProblems



More information about the Linux-HA mailing list