AW: AW: [Linux-ha-dev] Solaris 10/i386: Heartbeat/Stonithd
hangsonshutdown.
Otte, Joerg
joerg.otte at nsn.com
Thu May 10 23:07:21 MDT 2007
> So Part Two is pending on getting such data.
What I can see the difference between Bourne and Bash
is established in the following process trees:
Bourne:
1125 /usr/sfw/lib/python2.3/heartbeat/heartbeat
1129 /usr/sfw/lib/python2.3/heartbeat/heartbeat
1130 /usr/sfw/lib/python2.3/heartbeat/heartbeat
1131 /usr/sfw/lib/python2.3/heartbeat/heartbeat
1132 /usr/sfw/lib/python2.3/heartbeat/heartbeat
1133 /usr/sfw/lib/python2.3/heartbeat/heartbeat
1135 sh -c /usr/sfw/lib/python2.3/heartbeat/ccm
1144 /usr/sfw/lib/python2.3/heartbeat/ccm
1136 sh -c /usr/sfw/lib/python2.3/heartbeat/cib
1147 /usr/sfw/lib/python2.3/heartbeat/cib
1137 sh -c /usr/sfw/lib/python2.3/heartbeat/lrmd -r
1143 /usr/sfw/lib/python2.3/heartbeat/lrmd -r
1138 sh -c /usr/sfw/lib/python2.3/heartbeat/stonithd
1145 /usr/sfw/lib/python2.3/heartbeat/stonithd
..
Bash:
2849 /usr/sfw/lib/python2.3/heartbeat/ha_logd -d
2858 /usr/sfw/lib/python2.3/heartbeat/ha_logd -d
2859 /usr/sfw/lib/python2.3/heartbeat/heartbeat
2863 /usr/sfw/lib/python2.3/heartbeat/heartbeat
2864 /usr/sfw/lib/python2.3/heartbeat/heartbeat
2865 /usr/sfw/lib/python2.3/heartbeat/heartbeat
2866 /usr/sfw/lib/python2.3/heartbeat/heartbeat
2867 /usr/sfw/lib/python2.3/heartbeat/heartbeat
2870 /usr/sfw/lib/python2.3/heartbeat/ccm
2871 /usr/sfw/lib/python2.3/heartbeat/cib
2872 /usr/sfw/lib/python2.3/heartbeat/lrmd -r
2873 /usr/sfw/lib/python2.3/heartbeat/stonithd
..
When using the Bourne shell to start a child prodess I get
2 processes (in the case of stonithd 1138,1145)
When using Bash I get only one child process (2873).
When I try to kill 1138 from command line nothing happens.
When I kill 1145 from command line both processes (1138,1145)
disappear. So I think heartbeat always tries to kill 1138
which does not work.
Just an idear: A solution may be to start child processes
directly (not via the shell). This would work independently
of the installed shell.
-----Ursprüngliche Nachricht-----
Von: linux-ha-dev-bounces at lists.linux-ha.org [mailto:linux-ha-dev-bounces at lists.linux-ha.org] Im Auftrag von ext David Lee
Gesendet: Donnerstag, 10. Mai 2007 21:43
An: High-Availability Linux Development List
Betreff: Re: AW: [Linux-ha-dev] Solaris 10/i386: Heartbeat/Stonithd hangsonshutdown.
On Thu, 10 May 2007, David Lee wrote:
> On Thu, 10 May 2007, Otte, Joerg wrote:
>
> > [...]
> > I also looked for similar problems and also changed
> > Strings:
> > "/sbin/reboot -nf" -> "/usr/sbin/reboot -f"
> > "/sbin/poweroff" -> "/usr/sbin/poweroff"
> >
> > FYI: Details are in attched patch file.
> >
> > I think it would be better to have autoconf handle
> > those hard coded strings.
>
> Certainly. I'll try to take a look in the next few days. Would you mind
> if I asked you to do run-time confirmations of possible solutions? (If
> you don't hear, and if nothing appears in the http://hg.linux-ha.org/dev/
> repository, then bug me...)
Following up...
Joerg's was a two-part issue: the two parts (I think) independent.
Part One...
"lib/plugins/stonith/ssh.c" and "lib/plugins/stonith/suicide.c" both had
built-in strings which were Linux-specific.
I've just pushed an update into the "dev" repository which completes the
autoconf stuff (already partly in place for a long time for some other,
script-based, code) and uses the results, including on Solaris.
Please would some Linux people, and Joerg Otte, give this a thorough
exercising and testing?
(Because "configure.in" is part of the update, "ConfigureMe bootstrap" is
strongly recommended to get a fresh, clean start.)
I noted that the autoconf method changed the file locations for Linux from
"/sbin/reboot" to "/usr/bin/reboot" (similarly for 'poweroff'). Linux
seems to have both locations valid, although with potential subtle
differences. Does this matter? If it is a serious issue, then I think
the correction would be in "configure.in" to invoke the (optional) 'path'
argument to the "AC_PATH_PROGS()" macro.
So, with a bit of luck, that Part One of Joerg's issue should now be
more-or-less fixed (and I hope without breaking anything else in the
process).
Part Two...
This was to do with Bourne vs. Bash parent-child behaviours. I've asked
Joerg to provide some more data on that if possible, so that we can try to
come to a Bourne-based solution.
So Part Two is pending on getting such data.
--
: David Lee I.T. Service :
: Senior Systems Programmer Computer Centre :
: UNIX Team Leader Durham University :
: South Road :
: http://www.dur.ac.uk/t.d.lee/ Durham DH1 3LE :
: Phone: +44 191 334 2752 U.K. :
_______________________________________________________
Linux-HA-Dev: Linux-HA-Dev at lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/
More information about the Linux-HA-Dev
mailing list