[ENBD] Strange behavior

Peter T. Breuer ptb@it.uc3m.es
Thu, 5 Oct 2000 17:23:52 +0200 (MET DST)


"A month of sundays ago Adrian Turcu wrote:"
> "Peter T. Breuer" wrote:
> > This is what is supposed to happen. Congratulations.
> > (What do you think is wrong?).

(Please send me the output from /proc/nbdinfo)

> Well, if I will put this device into a RAID-1 configuration on machine B,
> RAID will hang up because this device is never reported down and I have

It's supposed to stay up under almost all circumstances.  It is supposed
that you have multiple network links to the remote, so that if one fails
the device will fall over to using the next one.

> no posibilities to report to the RAID system what's happening with /dev/ndXXX.

Turn the device off by sending it a "die" signal via a USR1 to the
daemons (that should be enough to frighten RAID, from what I have heard)
or echo -n 0 to /proc/nbdinfo.  Either will cause all outstanding requests
to be errored out and the device to report itself as disabled during 5s.

If you want to turn it off permanently, kill the daemons first. And or
try USR2 (i.e. echo -n 1).

How do you know that the net is down, by the way (this is a
theoretical/rhetorical question)?  TCP won't know it.  The network
device won't know it.  Look with /sbin/ifconfig!  You can't distinguish
between a slow net and a down net that easily!  It sometimes takes me
ten minutes to get a single packet through my ISP from home to my office
and the net's working, according to them!  Yeah, sure, YOU know you
disconnected the cable, but nothing else knows it, not even the kernel.
Packets you send will at least hang around while you have a route up.
If you want to let the kernel know (never mind userspace!) that the
net is out, you will have to down the ethernet device.  Just taking out
the route may be enough to let the IP layer see an error and report it.

The absence of a heartbeat pulse from the remote server _will_ cause the
client daemons to disengage from the kernel after a few seconds, and
then begin their reconnect attmpts.  They will not error out pending
requests, however, but hang on to them until reconnect is established.
Show me the output from /proc/nbdinfo so that I can confirm that the
daemons have disengaged from the kernel. They may be stuck in a network 
send or recieve and be waiting for news from the network layer. I can't
control that.

I can code the device to error out requests and reporting the device as
disabled within a few seconds of all clients having gone into reconnect
mode.  But it seems fairly risky to me, as well as a question of policy.
Temporary network outages are not uncommon.  It's also pretty difficult
because the mechanism has to be a "fail-safe", i.e.  each client has to
report regularly or be ejected from the kernel, and from experience I
know that that sort of thing is immensely hard to code because of the
unexpected interactions with other mechanisms.  But if you are sure that
is what you need, I will code for it.  At the moment, I believe you can
get the effect you want by changing the driver code in nbd_clr_sock
(the disengage notice that the daemons send when they shut down)
that reads

        if (lo->aslot <= 0) {
           lo->flags &= ~NBD_ENABLED;
           NBD_DEBUG (3, "disabled device nd%c\n", lo->nbd + 'a');
           invalidate_buffers (MKDEV(NBD_MAJOR, (lo->nbd)<<NBD_SHIFT));
           NBD_DEBUG (3, "invalidated buffers on device %c\n", lo->nbd + 'a');
        } else {
 
(actually, that invalidate_buffers() might be wrong!) in order to add
to the stanza either 

        nbd_soft_reset(lo);  // USR1
or
        nbd_hard_reset();    // USR2

Gadzooks ...  the stanza is executed with the semaphore held!  I suppose
to avoid adding requests while we're monkeying with the active device
count.  But it may possibly deadlock against a reset.  So you also, errr
..  the simplest thing is to add a variable

    int want_device_reset = 0;

at the top of the code, set it to 1 in the stanza above, and then just
after the next

    NBD_SEMAPHORE_UP (&lo->queue_lock);         /* PTB */

add 

   if (want_device_reset)
        nbd_soft_reset(lo);  // USR1

In my view, a still safer variant of this idea is to schedule the
change for the next time slice, when presumably this bit of code is
well out of harms way! I.e.

   static struct tq_struct run_nbd_reset = { 0, 0, (void*) nbd_soft_reset, lo };
   queue_task(&run_nbd_reset, &tq_scheduler);

Let me know how you get on. Survival notices particularly interesting.


Peter