AW: AW: [Linux-ha-dev] Solaris 10/i386: Heartbeat/Stonithd
hangsonshutdown.
David Lee
t.d.lee at durham.ac.uk
Fri Jun 1 03:29:07 MDT 2007
On Fri, 11 May 2007, David Lee wrote:
> On Fri, 11 May 2007, Otte, Joerg wrote:
>
> > What I can see the difference between Bourne and Bash
> > is established in the following process trees:
> > Bourne:
> > 1125 /usr/sfw/lib/python2.3/heartbeat/heartbeat
> > 1129 /usr/sfw/lib/python2.3/heartbeat/heartbeat
> > 1130 /usr/sfw/lib/python2.3/heartbeat/heartbeat
> > 1131 /usr/sfw/lib/python2.3/heartbeat/heartbeat
> > 1132 /usr/sfw/lib/python2.3/heartbeat/heartbeat
> > 1133 /usr/sfw/lib/python2.3/heartbeat/heartbeat
> > 1135 sh -c /usr/sfw/lib/python2.3/heartbeat/ccm
> > 1144 /usr/sfw/lib/python2.3/heartbeat/ccm
> > 1136 sh -c /usr/sfw/lib/python2.3/heartbeat/cib
> > 1147 /usr/sfw/lib/python2.3/heartbeat/cib
> > 1137 sh -c /usr/sfw/lib/python2.3/heartbeat/lrmd -r
> > 1143 /usr/sfw/lib/python2.3/heartbeat/lrmd -r
> > 1138 sh -c /usr/sfw/lib/python2.3/heartbeat/stonithd
> > 1145 /usr/sfw/lib/python2.3/heartbeat/stonithd
> > ..
> >
> > Bash:
> > 2849 /usr/sfw/lib/python2.3/heartbeat/ha_logd -d
> > 2858 /usr/sfw/lib/python2.3/heartbeat/ha_logd -d
> > 2859 /usr/sfw/lib/python2.3/heartbeat/heartbeat
> > 2863 /usr/sfw/lib/python2.3/heartbeat/heartbeat
> > 2864 /usr/sfw/lib/python2.3/heartbeat/heartbeat
> > 2865 /usr/sfw/lib/python2.3/heartbeat/heartbeat
> > 2866 /usr/sfw/lib/python2.3/heartbeat/heartbeat
> > 2867 /usr/sfw/lib/python2.3/heartbeat/heartbeat
> > 2870 /usr/sfw/lib/python2.3/heartbeat/ccm
> > 2871 /usr/sfw/lib/python2.3/heartbeat/cib
> > 2872 /usr/sfw/lib/python2.3/heartbeat/lrmd -r
> > 2873 /usr/sfw/lib/python2.3/heartbeat/stonithd
> > ..
> >
>
> Many thanks, Joerg! That's most helpful.
>
>
> [...] It then proceeds:
> -----------------------------
> (void)execl("/bin/sh", "sh", "-c", centry->command
> , (const char *)NULL);
> -----------------------------
>
> It might be that different implementations of "/bin/sh" do different
> things. For instance, some Bournes (e.g. Solaris) might (I speculate!)
> might then spawn "centry->command" as a subprocess (a further fork/exec)
> whereas Bash might (speculation continues) do a direct replacement "exec".
>
> It seems that heartbeat requires the 'direct replacement "exec"' style of
> operation (stored child pid is the final "centry->command"), which Bash
> seems to give but which Bourne (child pid is intermediate "sh") doesn't.
>
> Is there any reason why this 'execl("/bin/sh" ... centry->command ...)'
> tries to go via a shell? Why doesn't it simply go directly, and
> unambiguously, to the 'centry->command'?
>
> This is beginning to feel like a Bash-ism bug.
Note: this is being monitored in bug 1576:
http://old.linux-foundation.org/developer_bugzilla/show_bug.cgi?id=1576
Joerg: Sorry for the delay. (Heartbeat is a purely voluntary, spare-time,
activity for me, outside work, family, and the rest of life...)
You might try the attached patch, which avoids the "sh" and goes directly
to the desired 'centry->command'. (My Solaris confirmed your issue, and
the patch seems to cure it.)
Others: I would value, too, your tests in your environments, and/or
general comments about the proposed patch.
Also: A quick (and possibly incomplete) grep through the source code
revealed that "lib/stonith/expect.c" might also exhibit this problem, for
which I would propose a similar solution. (There seems to be a subtle
difference, in that "heartbeat/heartbeat.c" used 'execl()' but
"lib/stonith/expect.c" uses 'execlp()'. Is this deliberate?)
--
: David Lee I.T. Service :
: Senior Systems Programmer Computer Centre :
: UNIX Team Leader Durham University :
: South Road :
: http://www.dur.ac.uk/t.d.lee/ Durham DH1 3LE :
: Phone: +44 191 334 2752 U.K. :
-------------- next part --------------
--- heartbeat/heartbeat.c.orig Wed May 16 14:53:39 2007
+++ heartbeat/heartbeat.c Thu May 31 17:53:19 2007
@@ -214,6 +214,7 @@
#include <sys/resource.h>
#include <dirent.h>
#include <netdb.h>
+#include <wordexp.h>
#include <ltdl.h>
#ifdef _POSIX_MEMLOCK
# include <sys/mman.h>
@@ -3807,6 +3808,9 @@
const char * devnull = "/dev/null";
unsigned int j;
struct rlimit oflimits;
+ wordexp_t we;
+ int rc;
+
CL_SIGNAL(SIGCHLD, SIG_DFL);
alarm(0);
CL_IGNORE_SIG(SIGALRM);
@@ -3819,11 +3823,19 @@
(void)open(devnull, O_RDONLY); /* Stdin: fd 0 */
(void)open(devnull, O_WRONLY); /* Stdout: fd 1 */
(void)open(devnull, O_WRONLY); /* Stderr: fd 2 */
- (void)execl("/bin/sh", "sh", "-c", centry->command
- , (const char *)NULL);
- /* Should not happen */
- cl_perror("Cannot exec %s", centry->command);
+ /* expand 'centry->command' string into 'exec()' arg list */
+ rc = wordexp(centry->command, &we, 0);
+ if (rc != 0) {
+ cl_perror("Bad command specification (error:%d): %s",
+ rc, centry->command);
+ }
+ else {
+ (void)execv(we.we_wordv[0], we.we_wordv);
+
+ /* Should not happen */
+ cl_perror("Cannot exec %s", centry->command);
+ }
}
/* Suppress respawning */
exit(100);
More information about the Linux-HA-Dev
mailing list