[Linux-HA] WARN: Exiting HBREAD process returned rc 10.
Dave Dykstra
dwdha at drdykstra.us
Fri Aug 5 11:46:07 MDT 2005
On Wed, Aug 03, 2005 at 10:24:37AM -0600, Alan Robertson wrote:
> Dave Dykstra wrote:
> >My computer room experienced a power outage last weekend. The HA-NFS
> >servers survived it because they're on extra UPSs, but everything else
> >was out for a little over 10 minutes. The two HA servers have a direct
> >connection on eth1, and on eth0 they've got a connection to a gigabit
> >switch that goes to the rest of the network, and the switch lost power
> >along with everything else. I am running version 1.99.5.cvs.20050708
> >from ultramonkey.
> >
> >The active HA server stayed active for about 40 seconds but then said
> > WARN: Exiting HBREAD process 4227 returned rc 10.
> >and started a heartbeat shutdown. Does anybody have any idea of what
> >could cause that? It looks to me in the source that all exits use a
> >code of LSB_EXIT_something and I don't see any of them defined to be 10.
> >The same message appeared on the standby server two seconds later but
> >its shutdown kept being delayed. A very similar power outage occurred
> >on July 10 and these WARNs did not occur and a heartbeat shutdown did
> >not happen, although at that time I was running a CVS version of 1.2.3
> >(the old log messages say heartbeat's version was 1.2.4).
> >
> >This error might not have been so bad except that when the standby server
> >attempted to take over it hung during the start of nfs-kernel-server
> >until the power came back on. I'm not sure why, I'm planning on doing
> >some experiments at the end of this week to try to figure that out.
> >Worse, after that, even though the standby server had finished taking
> >over, when the clients booted up they all got their mounts refused with
> >messages like
> > rpc.mountd: refused mount request from 172.18.30.2 for /mnt/home (/):
> > no export entry
> >which was the real killer and also a mystery although I assume it was
> >related to the long-delayed start of nfs-kernel-server.
> >
> >Below are all the log messages leading up to the WARN message on both
> >sides.
...
>
>
> Dave:
>
> This may be fixed by the patches I posted for a very similar problem
> encountered by Ulrich Thomas. If you are running an app on the dying
> machine which is pinging something like mad in the same time interval,
> then these patches may be for you.
I don't believe there is any app that is pinging like mad but the patch
still works. See my response in the "ERROR: 100 NULL ..." thread.
- Dave
More information about the Linux-HA
mailing list