[Linux-ha-dev] Hb-2.08/stable: cib crashes under solaris 10/i386
Andrew Beekhof
beekhof at gmail.com
Thu May 3 03:28:52 MDT 2007
On 5/2/07, Otte, Joerg <joerg.otte at nsn.com> wrote:
> I am trying to get heartbeat 2.08/stable running under Solaris 10 /
> I386.
> OS: SunOS bcm20-a 5.10 Generic_125101-03 i86pc i386 i86pc
>
> Whereas V1 configuration seem to work properly (I didn't go into details
> yet),
> I currently have the following problem with a V2 configuration:
>
> Case 1) "The cib process crashes with core dump on the second node."
I wonder... could this be as simple as trying to print a NULL pointer
as a string?
Any chance you could try this patch:
--- a/crm/cib/notify.c Thu May 03 10:03:54 2007 +0200
+++ b/crm/cib/notify.c Thu May 03 11:26:05 2007 +0200
@@ -392,11 +392,13 @@ cib_replace_notify(crm_data_t *update, e
if(add_updates != del_updates) {
crm_info("Replaced: %d.%d.%d -> %d.%d.%d from %s",
- del_admin_epoch, del_epoch, del_updates,
- add_admin_epoch, add_epoch, add_updates, origin);
+ del_admin_epoch, del_epoch, del_updates,
+ add_admin_epoch, add_epoch, add_updates,
+ crm_str(origin));
} else if(diff != NULL) {
crm_info("Local-only Replace: %d.%d.%d from %s",
- add_admin_epoch, add_epoch, add_updates, origin);
+ add_admin_epoch, add_epoch, add_updates,
+ crm_str(origin));
}
replace_msg = ha_msg_new(8);
> Case 2) "Heartbeat/Stonithd hangs on shutdown."
>
>
> Attached logs cover the following situations:
>
> Case 1)
> - Heartbeat on node-a ("bcm20-a") came up successfully with a fresh
> cib.xml. Resources are
> successfully started.
> - When I then start Heartbeat on node b ("bcm20-b") the cib process
> crashes on node b.
>
> This is the stack dump of the cib process on node b:
>
> core '/var/ha/local/lib/heartbeat/cores/hacluster/core' of 576:
> /usr/sfw/lib/python2.3/heartbeat/cib
> fea54c7c strlen (8061466, 80472f8, 8045e90, 0) + c
> feaad3cb vsnprintf (8045ed0, 1400, 806143c, 80472f8) + 73
> fef8cde4 cl_log (6, 806143c, 805e9a1, 0, 0, 0) + 58
> 080590af cib_replace_notify (80a8120, 0, 80aa740, 809bac0) + 1ab
> 08057383 cib_process_replace (80a6590, 11100000, 0, 809dee0, 80bf1f0,
> 80473f4) + 197
> 0805a44e cib_process_command (8085520, 8047460, 8047464, 1, fea549d0,
> 0) + 30e
> 0805af60 cib_process_request (8085520, 0, 1, 1, 0) + 1e4
> 0805c264 cib_peer_callback (8085520, 8075f08, 80475a8, fef02825) + 1d8
> fef02839 read_msg_w_callbacks (8075f08, 0, 80475c8, fef025d1) + 209
> fef02c26 rcvmsg (8075f08, 0, 5, 0) + 1e
> 0805c02e cib_ha_dispatch (807a058, 8075f08, 8047668, fef87956) + 86
> fef87b36 G_CH_dispatch_int (807ccd0, 0, 0, 0) + 252
> fee2c77f g_main_context_dispatch (8075058, 0, 8080490, d) + 1e7
> fee2e065 g_main_context_iterate (1, 80ba4a8, 8047748, fee2e141,
> 805e355, 8075058) + 41d
> fee2e2c0 g_main_loop_run (80742b8, 805d5d4, 0, 1, 0, 1) + 19c
> 0805e355 init_start (80477a0, 80540bd, 8073458, 807346c, 0, 805e5ba) +
> 59d
> 0805e4ff main (1, 80477cc, 80477d4) + f7
> 08053f98 _start (1, 8047948, 0, 804796d, 8047984, 80479ac) + 80
>
> attached files: case1.bcm20-a.tar.gz case1.bcm20-b.tar.gz
>
> Case 2)
> When I shutdown heartbeat it tells me:
> > bcm20-a:/ # /etc/init.d/heartbeat stop
> > Stopping High-Availability services:
> > Done.
>
> But the following processes remain running:
> > bcm20-a:/ # ptree -a 1125
> > 1 /sbin/init
> > 1125 /usr/sfw/lib/python2.3/heartbeat/heartbeat
> > 1129 /usr/sfw/lib/python2.3/heartbeat/heartbeat
> > 1130 /usr/sfw/lib/python2.3/heartbeat/heartbeat
> > 1131 /usr/sfw/lib/python2.3/heartbeat/heartbeat
> > 1132 /usr/sfw/lib/python2.3/heartbeat/heartbeat
> > 1133 /usr/sfw/lib/python2.3/heartbeat/heartbeat
> > 1135 sh -c /usr/sfw/lib/python2.3/heartbeat/ccm
> > 1144 /usr/sfw/lib/python2.3/heartbeat/ccm
> > 1136 sh -c /usr/sfw/lib/python2.3/heartbeat/cib
> > 1147 /usr/sfw/lib/python2.3/heartbeat/cib
> > 1137 sh -c /usr/sfw/lib/python2.3/heartbeat/lrmd -r
> > 1143 /usr/sfw/lib/python2.3/heartbeat/lrmd -r
> > 1138 sh -c /usr/sfw/lib/python2.3/heartbeat/stonithd
> > 1145 /usr/sfw/lib/python2.3/heartbeat/stonithd
>
> stonithd has the following file descriptors still open:
> > bcm20-a:/ # pfiles 1145
> > 1145: /usr/sfw/lib/python2.3/heartbeat/stonithd
> > Current rlimit: 256 file descriptors
> > 0: S_IFCHR mode:0666 dev:270,0 ino:6815752 uid:0 gid:3 rdev:13,2
> > O_RDONLY|O_LARGEFILE
> > /devices/pseudo/mm at 0:null
> > 1: S_IFCHR mode:0666 dev:270,0 ino:6815752 uid:0 gid:3 rdev:13,2
> > O_WRONLY|O_LARGEFILE
> > /devices/pseudo/mm at 0:null
> > 2: S_IFCHR mode:0666 dev:270,0 ino:6815752 uid:0 gid:3 rdev:13,2
> > O_WRONLY|O_LARGEFILE
> > /devices/pseudo/mm at 0:null
> > 3: S_IFDOOR mode:0444 dev:279,0 ino:53 uid:0 gid:0 size:0
> > O_RDONLY|O_LARGEFILE FD_CLOEXEC door to nscd[106]
> > /var/run/name_service_door
> > 4: S_IFCHR mode:0000 dev:270,0 ino:39275 uid:0 gid:0 rdev:21,88
> > O_WRONLY FD_CLOEXEC
> > /devices/pseudo/log at 0:conslog
> > 5: S_IFSOCK mode:0666 dev:276,0 ino:17792 uid:0 gid:0 size:0
> > O_RDWR|O_NONBLOCK
> > SOCK_STREAM
> > SO_SNDBUF(16384),SO_RCVBUF(5120)
> > sockname: AF_UNIX
> > peername: AF_UNIX /var/ha/local/run/heartbeat/register
> > 6: S_IFSOCK mode:0666 dev:276,0 ino:20034 uid:0 gid:0 size:0
> > O_RDWR|O_NONBLOCK
> > SOCK_STREAM
> > SO_SNDBUF(16384),SO_RCVBUF(5120)
> > sockname: AF_UNIX /var/ha/local/run/heartbeat/stonithd
> > 7: S_IFSOCK mode:0666 dev:276,0 ino:19832 uid:0 gid:0 size:0
> > O_RDWR|O_NONBLOCK
> > SOCK_STREAM
> > SO_SNDBUF(16384),SO_RCVBUF(5120)
> > sockname: AF_UNIX
> /var/ha/local/run/heartbeat/stonithd_callback
>
>
> attached files: case2.bcm20-a.tar.gz
>
> Shutdown proceeds normally if I kill stonithd (1145).
>
>
>
> Any help would be appreciated.
>
> Joerg
>
> _______________________________________________________
> Linux-HA-Dev: Linux-HA-Dev at lists.linux-ha.org
> http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
> Home Page: http://linux-ha.org/
>
>
>
More information about the Linux-HA-Dev
mailing list