AW: AW: [Linux-ha-dev] Solaris 10/i386: Heartbeat/Stonithd hangsonshutdown.

Dejan Muhamedagic dejanmm at fastmail.fm
Fri Jun 1 11:54:44 MDT 2007


On Fri, Jun 01, 2007 at 10:29:07AM +0100, David Lee wrote:
> On Fri, 11 May 2007, David Lee wrote:
> 
> > On Fri, 11 May 2007, Otte, Joerg wrote:
> >
> > > What I can see the difference between Bourne and Bash
> > > is established in the following process trees:
> > > Bourne:
> > > 1125  /usr/sfw/lib/python2.3/heartbeat/heartbeat
> > >   1129  /usr/sfw/lib/python2.3/heartbeat/heartbeat
> > >   1130  /usr/sfw/lib/python2.3/heartbeat/heartbeat
> > >   1131  /usr/sfw/lib/python2.3/heartbeat/heartbeat
> > >   1132  /usr/sfw/lib/python2.3/heartbeat/heartbeat
> > >   1133  /usr/sfw/lib/python2.3/heartbeat/heartbeat
> > >   1135  sh -c /usr/sfw/lib/python2.3/heartbeat/ccm
> > >     1144  /usr/sfw/lib/python2.3/heartbeat/ccm
> > >   1136  sh -c /usr/sfw/lib/python2.3/heartbeat/cib
> > >     1147  /usr/sfw/lib/python2.3/heartbeat/cib
> > >   1137  sh -c /usr/sfw/lib/python2.3/heartbeat/lrmd -r
> > >     1143  /usr/sfw/lib/python2.3/heartbeat/lrmd -r
> > >   1138  sh -c /usr/sfw/lib/python2.3/heartbeat/stonithd
> > >     1145  /usr/sfw/lib/python2.3/heartbeat/stonithd
> > >   ..
> > >
> > > Bash:
> > > 2849  /usr/sfw/lib/python2.3/heartbeat/ha_logd -d
> > >   2858  /usr/sfw/lib/python2.3/heartbeat/ha_logd -d
> > > 2859  /usr/sfw/lib/python2.3/heartbeat/heartbeat
> > >   2863  /usr/sfw/lib/python2.3/heartbeat/heartbeat
> > >   2864  /usr/sfw/lib/python2.3/heartbeat/heartbeat
> > >   2865  /usr/sfw/lib/python2.3/heartbeat/heartbeat
> > >   2866  /usr/sfw/lib/python2.3/heartbeat/heartbeat
> > >   2867  /usr/sfw/lib/python2.3/heartbeat/heartbeat
> > >   2870  /usr/sfw/lib/python2.3/heartbeat/ccm
> > >   2871  /usr/sfw/lib/python2.3/heartbeat/cib
> > >   2872  /usr/sfw/lib/python2.3/heartbeat/lrmd -r
> > >   2873  /usr/sfw/lib/python2.3/heartbeat/stonithd
> > >   ..
> > >
> >
> > Many thanks, Joerg!  That's most helpful.
> >
> >
> > [...] It then proceeds:
> > -----------------------------
> >                 (void)execl("/bin/sh", "sh", "-c", centry->command
> >                 ,       (const char *)NULL);
> > -----------------------------
> >
> > It might be that different implementations of "/bin/sh" do different
> > things.  For instance, some Bournes (e.g. Solaris) might (I speculate!)
> > might then spawn "centry->command" as a subprocess (a further fork/exec)
> > whereas Bash might (speculation continues) do a direct replacement "exec".
> >
> > It seems that heartbeat requires the 'direct replacement "exec"' style of
> > operation (stored child pid is the final "centry->command"), which Bash
> > seems to give but which Bourne (child pid is intermediate "sh") doesn't.
> >
> > Is there any reason why this 'execl("/bin/sh" ... centry->command ...)'
> > tries to go via a shell?  Why doesn't it simply go directly, and
> > unambiguously, to the 'centry->command'?
> >
> > This is beginning to feel like a Bash-ism bug.
> 
> Note: this is being monitored in bug 1576:
>    http://old.linux-foundation.org/developer_bugzilla/show_bug.cgi?id=1576
> 
> Joerg: Sorry for the delay.  (Heartbeat is a purely voluntary, spare-time,
> activity for me, outside work, family, and the rest of life...)
> 
> You might try the attached patch, which avoids the "sh" and goes directly
> to the desired 'centry->command'.  (My Solaris confirmed your issue, and
> the patch seems to cure it.)
> 
> 
> Others: I would value, too, your tests in your environments, and/or
> general comments about the proposed patch.

Your patch looks sane. I wonder why /bin/sh was used at all here.
Perhaps Alan can explain.

> Also: A quick (and possibly incomplete) grep through the source code
> revealed that "lib/stonith/expect.c" might also exhibit this problem, for
> which I would propose a similar solution.  (There seems to be a subtle
> difference, in that "heartbeat/heartbeat.c" used 'execl()' but
> "lib/stonith/expect.c" uses 'execlp()'.  Is this deliberate?)
> 
> 
> 
> -- 
> 
> :  David Lee                                I.T. Service          :
> :  Senior Systems Programmer                Computer Centre       :
> :  UNIX Team Leader                         Durham University     :
> :                                           South Road            :
> :  http://www.dur.ac.uk/t.d.lee/            Durham DH1 3LE        :
> :  Phone: +44 191 334 2752                  U.K.                  :
Content-Description: exec patch
> --- heartbeat/heartbeat.c.orig	Wed May 16 14:53:39 2007
> +++ heartbeat/heartbeat.c	Thu May 31 17:53:19 2007
> @@ -214,6 +214,7 @@
>  #include <sys/resource.h>
>  #include <dirent.h>
>  #include <netdb.h>
> +#include <wordexp.h>
>  #include <ltdl.h>
>  #ifdef _POSIX_MEMLOCK
>  #	include <sys/mman.h>
> @@ -3807,6 +3808,9 @@
>  		const char *	devnull = "/dev/null";
>  		unsigned int	j;
>  		struct rlimit		oflimits;
> +		wordexp_t we;
> +		int rc;
> +
>  		CL_SIGNAL(SIGCHLD, SIG_DFL);
>  		alarm(0);
>  		CL_IGNORE_SIG(SIGALRM);
> @@ -3819,11 +3823,19 @@
>  		(void)open(devnull, O_RDONLY);	/* Stdin:  fd 0 */
>  		(void)open(devnull, O_WRONLY);	/* Stdout: fd 1 */
>  		(void)open(devnull, O_WRONLY);	/* Stderr: fd 2 */
> -		(void)execl("/bin/sh", "sh", "-c", centry->command
> -		,	(const char *)NULL);
>  
> -		/* Should not happen */
> -		cl_perror("Cannot exec %s", centry->command);
> +		/* expand 'centry->command' string into 'exec()' arg list */
> +		rc = wordexp(centry->command, &we, 0);
> +		if (rc != 0) {
> +			cl_perror("Bad command specification (error:%d): %s",
> +			  rc, centry->command);
> +		}
> +		else {
> +			(void)execv(we.we_wordv[0], we.we_wordv);
> +
> +			/* Should not happen */
> +			cl_perror("Cannot exec %s", centry->command);
> +		}
>  	}
>  	/* Suppress respawning */
>  	exit(100);

> _______________________________________________________
> Linux-HA-Dev: Linux-HA-Dev at lists.linux-ha.org
> http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
> Home Page: http://linux-ha.org/


-- 
Dejan


More information about the Linux-HA-Dev mailing list