[Linux-HA] Re: [Linux-ha-dev] watchdog bug
alanr at unix.sh
Mon Mar 15 12:48:55 MST 2004
Alan Robertson wrote:
> Holger Kiehl wrote:
>> On Mon, 15 Mar 2004, Alan Robertson wrote:
>>> At first glance, it looks like 1.2.0 et al have watchdog directive
>>> bugs in
>>> Shutting down heartbeat gracefully *does* result in the machine
>>> rebooting when
>>> watchdog is enabled.
>>> This is with a softdog driver which (on reading the source) looks
>>> right to me,
>>> and IMHO, shouldn't be shutting us down, if we're doing the
>>> RightThing in
>>> I suspect we're either not handling the device close quite right, or
>>> we're not
>>> closing file descriptors in child processes.
>> With linux watchdog you must send a 'V' when you want to close a watchdog
>> gracefully. At least that was when I looked at it a year or two ago.
> We do that.
> But, we have child processes that may have it open (but shouldn't). I'm
> testing some improvements/changes right now. I'll let folks know how it
It looks like this is the "correct" behavior of the watchdog driver for
certain configuration options.
First, there is a configure option called CONFIG_WATCHDOG_NOWAYOUT which
sets the default for the softdog kernel module.
Second, there is a module option called "nowayout" which can also be set
for the softdog kernel module at the time the module is loaded.
If through defaults, or the module option, nowayout gets set to 1, then
THERE IS NO WAY POSSIBLE to stop the softdog module from rebooting the machine.
If it is not set to 1, then the 'V' character option mentioned before keeps
the machine from rebooting if the software wants to stop the timer.
On the other hand, one can set the timeout to 2 billion ticks - which would
be a pretty long time. So, plenty of time would be given to restart heartbeat.
It seems that SuSE 9.0 defaults this parameter to 1. I believe this is a bug.
The NOWAYOUT option was added because in the original code, killing the
process with the watchdog device opened stopped the timer. This is
obviously a bad idea - since core dumps, etc. probably shouldn't stop the
But, later, now that the 'V' char was added, this behavior seems no longer
needed, and certainly shouldn't be the default.
By the way, I'm struggling with how to turn off this option automatically
when the module is loaded. I seem to be biffing the modules.conf entry...
I added this entry:
options softdog nowayout=0
But it doesn't seem to help.
What am I doing wrong here?
Alan Robertson <alanr at unix.sh>
"Openness is the foundation and preservative of friendship... Let me claim
from you at all times your undisguised opinions." - William Wilberforce
More information about the Linux-HA