AW: AW: [Linux-ha-dev] Solaris 10/i386: Heartbeat/Stonithd
hangsonshutdown.
Dejan Muhamedagic
dejanmm at fastmail.fm
Fri Jun 1 11:54:44 MDT 2007
On Fri, Jun 01, 2007 at 10:29:07AM +0100, David Lee wrote:
> On Fri, 11 May 2007, David Lee wrote:
>
> > On Fri, 11 May 2007, Otte, Joerg wrote:
> >
> > > What I can see the difference between Bourne and Bash
> > > is established in the following process trees:
> > > Bourne:
> > > 1125 /usr/sfw/lib/python2.3/heartbeat/heartbeat
> > > 1129 /usr/sfw/lib/python2.3/heartbeat/heartbeat
> > > 1130 /usr/sfw/lib/python2.3/heartbeat/heartbeat
> > > 1131 /usr/sfw/lib/python2.3/heartbeat/heartbeat
> > > 1132 /usr/sfw/lib/python2.3/heartbeat/heartbeat
> > > 1133 /usr/sfw/lib/python2.3/heartbeat/heartbeat
> > > 1135 sh -c /usr/sfw/lib/python2.3/heartbeat/ccm
> > > 1144 /usr/sfw/lib/python2.3/heartbeat/ccm
> > > 1136 sh -c /usr/sfw/lib/python2.3/heartbeat/cib
> > > 1147 /usr/sfw/lib/python2.3/heartbeat/cib
> > > 1137 sh -c /usr/sfw/lib/python2.3/heartbeat/lrmd -r
> > > 1143 /usr/sfw/lib/python2.3/heartbeat/lrmd -r
> > > 1138 sh -c /usr/sfw/lib/python2.3/heartbeat/stonithd
> > > 1145 /usr/sfw/lib/python2.3/heartbeat/stonithd
> > > ..
> > >
> > > Bash:
> > > 2849 /usr/sfw/lib/python2.3/heartbeat/ha_logd -d
> > > 2858 /usr/sfw/lib/python2.3/heartbeat/ha_logd -d
> > > 2859 /usr/sfw/lib/python2.3/heartbeat/heartbeat
> > > 2863 /usr/sfw/lib/python2.3/heartbeat/heartbeat
> > > 2864 /usr/sfw/lib/python2.3/heartbeat/heartbeat
> > > 2865 /usr/sfw/lib/python2.3/heartbeat/heartbeat
> > > 2866 /usr/sfw/lib/python2.3/heartbeat/heartbeat
> > > 2867 /usr/sfw/lib/python2.3/heartbeat/heartbeat
> > > 2870 /usr/sfw/lib/python2.3/heartbeat/ccm
> > > 2871 /usr/sfw/lib/python2.3/heartbeat/cib
> > > 2872 /usr/sfw/lib/python2.3/heartbeat/lrmd -r
> > > 2873 /usr/sfw/lib/python2.3/heartbeat/stonithd
> > > ..
> > >
> >
> > Many thanks, Joerg! That's most helpful.
> >
> >
> > [...] It then proceeds:
> > -----------------------------
> > (void)execl("/bin/sh", "sh", "-c", centry->command
> > , (const char *)NULL);
> > -----------------------------
> >
> > It might be that different implementations of "/bin/sh" do different
> > things. For instance, some Bournes (e.g. Solaris) might (I speculate!)
> > might then spawn "centry->command" as a subprocess (a further fork/exec)
> > whereas Bash might (speculation continues) do a direct replacement "exec".
> >
> > It seems that heartbeat requires the 'direct replacement "exec"' style of
> > operation (stored child pid is the final "centry->command"), which Bash
> > seems to give but which Bourne (child pid is intermediate "sh") doesn't.
> >
> > Is there any reason why this 'execl("/bin/sh" ... centry->command ...)'
> > tries to go via a shell? Why doesn't it simply go directly, and
> > unambiguously, to the 'centry->command'?
> >
> > This is beginning to feel like a Bash-ism bug.
>
> Note: this is being monitored in bug 1576:
> http://old.linux-foundation.org/developer_bugzilla/show_bug.cgi?id=1576
>
> Joerg: Sorry for the delay. (Heartbeat is a purely voluntary, spare-time,
> activity for me, outside work, family, and the rest of life...)
>
> You might try the attached patch, which avoids the "sh" and goes directly
> to the desired 'centry->command'. (My Solaris confirmed your issue, and
> the patch seems to cure it.)
>
>
> Others: I would value, too, your tests in your environments, and/or
> general comments about the proposed patch.
Your patch looks sane. I wonder why /bin/sh was used at all here.
Perhaps Alan can explain.
> Also: A quick (and possibly incomplete) grep through the source code
> revealed that "lib/stonith/expect.c" might also exhibit this problem, for
> which I would propose a similar solution. (There seems to be a subtle
> difference, in that "heartbeat/heartbeat.c" used 'execl()' but
> "lib/stonith/expect.c" uses 'execlp()'. Is this deliberate?)
>
>
>
> --
>
> : David Lee I.T. Service :
> : Senior Systems Programmer Computer Centre :
> : UNIX Team Leader Durham University :
> : South Road :
> : http://www.dur.ac.uk/t.d.lee/ Durham DH1 3LE :
> : Phone: +44 191 334 2752 U.K. :
Content-Description: exec patch
> --- heartbeat/heartbeat.c.orig Wed May 16 14:53:39 2007
> +++ heartbeat/heartbeat.c Thu May 31 17:53:19 2007
> @@ -214,6 +214,7 @@
> #include <sys/resource.h>
> #include <dirent.h>
> #include <netdb.h>
> +#include <wordexp.h>
> #include <ltdl.h>
> #ifdef _POSIX_MEMLOCK
> # include <sys/mman.h>
> @@ -3807,6 +3808,9 @@
> const char * devnull = "/dev/null";
> unsigned int j;
> struct rlimit oflimits;
> + wordexp_t we;
> + int rc;
> +
> CL_SIGNAL(SIGCHLD, SIG_DFL);
> alarm(0);
> CL_IGNORE_SIG(SIGALRM);
> @@ -3819,11 +3823,19 @@
> (void)open(devnull, O_RDONLY); /* Stdin: fd 0 */
> (void)open(devnull, O_WRONLY); /* Stdout: fd 1 */
> (void)open(devnull, O_WRONLY); /* Stderr: fd 2 */
> - (void)execl("/bin/sh", "sh", "-c", centry->command
> - , (const char *)NULL);
>
> - /* Should not happen */
> - cl_perror("Cannot exec %s", centry->command);
> + /* expand 'centry->command' string into 'exec()' arg list */
> + rc = wordexp(centry->command, &we, 0);
> + if (rc != 0) {
> + cl_perror("Bad command specification (error:%d): %s",
> + rc, centry->command);
> + }
> + else {
> + (void)execv(we.we_wordv[0], we.we_wordv);
> +
> + /* Should not happen */
> + cl_perror("Cannot exec %s", centry->command);
> + }
> }
> /* Suppress respawning */
> exit(100);
> _______________________________________________________
> Linux-HA-Dev: Linux-HA-Dev at lists.linux-ha.org
> http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
> Home Page: http://linux-ha.org/
--
Dejan
More information about the Linux-HA-Dev
mailing list