[Linux-HA] WARN: Exiting HBREAD process returned rc 10.

Dave Dykstra dwdha at drdykstra.us
Fri Aug 5 11:46:07 MDT 2005


On Wed, Aug 03, 2005 at 10:24:37AM -0600, Alan Robertson wrote:
> Dave Dykstra wrote:
> >My computer room experienced a power outage last weekend.  The HA-NFS
> >servers survived it because they're on extra UPSs, but everything else
> >was out for a little over 10 minutes.  The two HA servers have a direct
> >connection on eth1, and on eth0 they've got a connection to a gigabit
> >switch that goes to the rest of the network, and the switch lost power
> >along with everything else.  I am running version 1.99.5.cvs.20050708
> >from ultramonkey.
> >
> >The active HA server stayed active for about 40 seconds but then said 
> >    WARN: Exiting HBREAD process 4227 returned rc 10.
> >and started a heartbeat shutdown.  Does anybody have any idea of what
> >could cause that?  It looks to me in the source that all exits use a
> >code of LSB_EXIT_something and I don't see any of them defined to be 10.
> >The same message appeared on the standby server two seconds later but
> >its shutdown kept being delayed.   A very similar power outage occurred
> >on July 10 and these WARNs did not occur and a heartbeat shutdown did
> >not happen, although at that time I was running a CVS version of 1.2.3
> >(the old log messages say heartbeat's version was 1.2.4).
> >
> >This error might not have been so bad except that when the standby server
> >attempted to take over it hung during the start of nfs-kernel-server
> >until the power came back on.  I'm not sure why, I'm planning on doing
> >some experiments at the end of this week to try to figure that out.
> >Worse, after that, even though the standby server had finished taking
> >over, when the clients booted up they all got their mounts refused with 
> >messages like
> >    rpc.mountd: refused mount request from 172.18.30.2 for /mnt/home (/): 
> >    no export entry
> >which was the real killer and also a mystery although I assume it was
> >related to the long-delayed start of nfs-kernel-server.
> >
> >Below are all the log messages leading up to the WARN message on both 
> >sides.
...
> 
> 
> Dave:
> 
> This may be fixed by the patches I posted for a very similar problem 
> encountered by Ulrich Thomas.  If you are running an app on the dying 
> machine which is pinging something like mad in the same time interval, 
> then these patches may be for you.

I don't believe there is any app that is pinging like mad but the patch
still works.  See my response in the "ERROR: 100 NULL ..." thread.

- Dave


More information about the Linux-HA mailing list