AW: AW: [Linux-ha-dev] Solaris 10/i386: Heartbeat/Stonithd hangsonshutdown.

Otte, Joerg joerg.otte at nsn.com
Thu May 10 23:07:21 MDT 2007


> So Part Two is pending on getting such data.

What I can see the difference between Bourne and Bash
is established in the following process trees:
Bourne:
1125  /usr/sfw/lib/python2.3/heartbeat/heartbeat
  1129  /usr/sfw/lib/python2.3/heartbeat/heartbeat
  1130  /usr/sfw/lib/python2.3/heartbeat/heartbeat
  1131  /usr/sfw/lib/python2.3/heartbeat/heartbeat
  1132  /usr/sfw/lib/python2.3/heartbeat/heartbeat
  1133  /usr/sfw/lib/python2.3/heartbeat/heartbeat
  1135  sh -c /usr/sfw/lib/python2.3/heartbeat/ccm
    1144  /usr/sfw/lib/python2.3/heartbeat/ccm
  1136  sh -c /usr/sfw/lib/python2.3/heartbeat/cib
    1147  /usr/sfw/lib/python2.3/heartbeat/cib
  1137  sh -c /usr/sfw/lib/python2.3/heartbeat/lrmd -r
    1143  /usr/sfw/lib/python2.3/heartbeat/lrmd -r
  1138  sh -c /usr/sfw/lib/python2.3/heartbeat/stonithd
    1145  /usr/sfw/lib/python2.3/heartbeat/stonithd
  ..

Bash:  
2849  /usr/sfw/lib/python2.3/heartbeat/ha_logd -d
  2858  /usr/sfw/lib/python2.3/heartbeat/ha_logd -d
2859  /usr/sfw/lib/python2.3/heartbeat/heartbeat
  2863  /usr/sfw/lib/python2.3/heartbeat/heartbeat
  2864  /usr/sfw/lib/python2.3/heartbeat/heartbeat
  2865  /usr/sfw/lib/python2.3/heartbeat/heartbeat
  2866  /usr/sfw/lib/python2.3/heartbeat/heartbeat
  2867  /usr/sfw/lib/python2.3/heartbeat/heartbeat
  2870  /usr/sfw/lib/python2.3/heartbeat/ccm
  2871  /usr/sfw/lib/python2.3/heartbeat/cib
  2872  /usr/sfw/lib/python2.3/heartbeat/lrmd -r
  2873  /usr/sfw/lib/python2.3/heartbeat/stonithd
  ..

When using the Bourne shell to start a child prodess I get
2 processes (in the case of stonithd 1138,1145)
When using Bash I get only one child process (2873).

When I try to kill 1138 from command line nothing happens.
When I kill 1145 from command line both processes (1138,1145)
disappear. So I think heartbeat always tries to kill 1138
which does not work.

Just an idear: A solution may be to start child processes
directly (not via the shell). This would work independently
of the installed shell.


-----Ursprüngliche Nachricht-----
Von: linux-ha-dev-bounces at lists.linux-ha.org [mailto:linux-ha-dev-bounces at lists.linux-ha.org] Im Auftrag von ext David Lee
Gesendet: Donnerstag, 10. Mai 2007 21:43
An: High-Availability Linux Development List
Betreff: Re: AW: [Linux-ha-dev] Solaris 10/i386: Heartbeat/Stonithd hangsonshutdown.

On Thu, 10 May 2007, David Lee wrote:

> On Thu, 10 May 2007, Otte, Joerg wrote:
>
> > [...]
> > I also looked for similar problems and also changed
> > Strings:
> > "/sbin/reboot -nf" -> "/usr/sbin/reboot -f"
> > "/sbin/poweroff" -> "/usr/sbin/poweroff"
> >
> > FYI: Details are in attched patch file.
> >
> > I think it would be better to have autoconf handle
> > those hard coded strings.
>
> Certainly.  I'll try to take a look in the next few days.  Would you mind
> if I asked you to do run-time confirmations of possible solutions?  (If
> you don't hear, and if nothing appears in the http://hg.linux-ha.org/dev/
> repository, then bug me...)

Following up...

Joerg's was a two-part issue: the two parts (I think) independent.


Part One...

"lib/plugins/stonith/ssh.c" and "lib/plugins/stonith/suicide.c" both had
built-in strings which were Linux-specific.

I've just pushed an update into the "dev" repository which completes the
autoconf stuff (already partly in place for a long time for some other,
script-based, code) and uses the results, including on Solaris.

Please would some Linux people, and Joerg Otte, give this a thorough
exercising and testing?

(Because "configure.in" is part of the update, "ConfigureMe bootstrap" is
strongly recommended to get a fresh, clean start.)


I noted that the autoconf method changed the file locations for Linux from
"/sbin/reboot" to "/usr/bin/reboot" (similarly for 'poweroff').  Linux
seems to have both locations valid, although with potential subtle
differences.  Does this matter?  If it is a serious issue, then I think
the correction would be in "configure.in" to invoke the (optional) 'path'
argument to the "AC_PATH_PROGS()" macro.

So, with a bit of luck, that Part One of Joerg's issue should now be
more-or-less fixed (and I hope without breaking anything else in the
process).



Part Two...

This was to do with Bourne vs. Bash parent-child behaviours.  I've asked
Joerg to provide some more data on that if possible, so that we can try to
come to a Bourne-based solution.

So Part Two is pending on getting such data.




-- 

:  David Lee                                I.T. Service          :
:  Senior Systems Programmer                Computer Centre       :
:  UNIX Team Leader                         Durham University     :
:                                           South Road            :
:  http://www.dur.ac.uk/t.d.lee/            Durham DH1 3LE        :
:  Phone: +44 191 334 2752                  U.K.                  :
_______________________________________________________
Linux-HA-Dev: Linux-HA-Dev at lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


More information about the Linux-HA-Dev mailing list