AW: AW: [Linux-ha-dev] Solaris 10/i386: Heartbeat/Stonithd hangsonshutdown.

David Lee t.d.lee at durham.ac.uk
Fri Jun 1 03:29:07 MDT 2007


On Fri, 11 May 2007, David Lee wrote:

> On Fri, 11 May 2007, Otte, Joerg wrote:
>
> > What I can see the difference between Bourne and Bash
> > is established in the following process trees:
> > Bourne:
> > 1125  /usr/sfw/lib/python2.3/heartbeat/heartbeat
> >   1129  /usr/sfw/lib/python2.3/heartbeat/heartbeat
> >   1130  /usr/sfw/lib/python2.3/heartbeat/heartbeat
> >   1131  /usr/sfw/lib/python2.3/heartbeat/heartbeat
> >   1132  /usr/sfw/lib/python2.3/heartbeat/heartbeat
> >   1133  /usr/sfw/lib/python2.3/heartbeat/heartbeat
> >   1135  sh -c /usr/sfw/lib/python2.3/heartbeat/ccm
> >     1144  /usr/sfw/lib/python2.3/heartbeat/ccm
> >   1136  sh -c /usr/sfw/lib/python2.3/heartbeat/cib
> >     1147  /usr/sfw/lib/python2.3/heartbeat/cib
> >   1137  sh -c /usr/sfw/lib/python2.3/heartbeat/lrmd -r
> >     1143  /usr/sfw/lib/python2.3/heartbeat/lrmd -r
> >   1138  sh -c /usr/sfw/lib/python2.3/heartbeat/stonithd
> >     1145  /usr/sfw/lib/python2.3/heartbeat/stonithd
> >   ..
> >
> > Bash:
> > 2849  /usr/sfw/lib/python2.3/heartbeat/ha_logd -d
> >   2858  /usr/sfw/lib/python2.3/heartbeat/ha_logd -d
> > 2859  /usr/sfw/lib/python2.3/heartbeat/heartbeat
> >   2863  /usr/sfw/lib/python2.3/heartbeat/heartbeat
> >   2864  /usr/sfw/lib/python2.3/heartbeat/heartbeat
> >   2865  /usr/sfw/lib/python2.3/heartbeat/heartbeat
> >   2866  /usr/sfw/lib/python2.3/heartbeat/heartbeat
> >   2867  /usr/sfw/lib/python2.3/heartbeat/heartbeat
> >   2870  /usr/sfw/lib/python2.3/heartbeat/ccm
> >   2871  /usr/sfw/lib/python2.3/heartbeat/cib
> >   2872  /usr/sfw/lib/python2.3/heartbeat/lrmd -r
> >   2873  /usr/sfw/lib/python2.3/heartbeat/stonithd
> >   ..
> >
>
> Many thanks, Joerg!  That's most helpful.
>
>
> [...] It then proceeds:
> -----------------------------
>                 (void)execl("/bin/sh", "sh", "-c", centry->command
>                 ,       (const char *)NULL);
> -----------------------------
>
> It might be that different implementations of "/bin/sh" do different
> things.  For instance, some Bournes (e.g. Solaris) might (I speculate!)
> might then spawn "centry->command" as a subprocess (a further fork/exec)
> whereas Bash might (speculation continues) do a direct replacement "exec".
>
> It seems that heartbeat requires the 'direct replacement "exec"' style of
> operation (stored child pid is the final "centry->command"), which Bash
> seems to give but which Bourne (child pid is intermediate "sh") doesn't.
>
> Is there any reason why this 'execl("/bin/sh" ... centry->command ...)'
> tries to go via a shell?  Why doesn't it simply go directly, and
> unambiguously, to the 'centry->command'?
>
> This is beginning to feel like a Bash-ism bug.

Note: this is being monitored in bug 1576:
   http://old.linux-foundation.org/developer_bugzilla/show_bug.cgi?id=1576

Joerg: Sorry for the delay.  (Heartbeat is a purely voluntary, spare-time,
activity for me, outside work, family, and the rest of life...)

You might try the attached patch, which avoids the "sh" and goes directly
to the desired 'centry->command'.  (My Solaris confirmed your issue, and
the patch seems to cure it.)


Others: I would value, too, your tests in your environments, and/or
general comments about the proposed patch.


Also: A quick (and possibly incomplete) grep through the source code
revealed that "lib/stonith/expect.c" might also exhibit this problem, for
which I would propose a similar solution.  (There seems to be a subtle
difference, in that "heartbeat/heartbeat.c" used 'execl()' but
"lib/stonith/expect.c" uses 'execlp()'.  Is this deliberate?)



-- 

:  David Lee                                I.T. Service          :
:  Senior Systems Programmer                Computer Centre       :
:  UNIX Team Leader                         Durham University     :
:                                           South Road            :
:  http://www.dur.ac.uk/t.d.lee/            Durham DH1 3LE        :
:  Phone: +44 191 334 2752                  U.K.                  :
-------------- next part --------------
--- heartbeat/heartbeat.c.orig	Wed May 16 14:53:39 2007
+++ heartbeat/heartbeat.c	Thu May 31 17:53:19 2007
@@ -214,6 +214,7 @@
 #include <sys/resource.h>
 #include <dirent.h>
 #include <netdb.h>
+#include <wordexp.h>
 #include <ltdl.h>
 #ifdef _POSIX_MEMLOCK
 #	include <sys/mman.h>
@@ -3807,6 +3808,9 @@
 		const char *	devnull = "/dev/null";
 		unsigned int	j;
 		struct rlimit		oflimits;
+		wordexp_t we;
+		int rc;
+
 		CL_SIGNAL(SIGCHLD, SIG_DFL);
 		alarm(0);
 		CL_IGNORE_SIG(SIGALRM);
@@ -3819,11 +3823,19 @@
 		(void)open(devnull, O_RDONLY);	/* Stdin:  fd 0 */
 		(void)open(devnull, O_WRONLY);	/* Stdout: fd 1 */
 		(void)open(devnull, O_WRONLY);	/* Stderr: fd 2 */
-		(void)execl("/bin/sh", "sh", "-c", centry->command
-		,	(const char *)NULL);
 
-		/* Should not happen */
-		cl_perror("Cannot exec %s", centry->command);
+		/* expand 'centry->command' string into 'exec()' arg list */
+		rc = wordexp(centry->command, &we, 0);
+		if (rc != 0) {
+			cl_perror("Bad command specification (error:%d): %s",
+			  rc, centry->command);
+		}
+		else {
+			(void)execv(we.we_wordv[0], we.we_wordv);
+
+			/* Should not happen */
+			cl_perror("Cannot exec %s", centry->command);
+		}
 	}
 	/* Suppress respawning */
 	exit(100);


More information about the Linux-HA-Dev mailing list