[ENBD] kernel message "non existant block device"

Peter T. Breuer ptb@it.uc3m.es
Thu, 26 Oct 2000 22:18:35 +0200 (MET DST)


"A month of sundays ago Peter T. Breuer wrote:"
> Consider: IF you run a single daemon then all accesses to the device
> are serialized through it.  The sequence goes: send ack send ack send
> ack.  This happens under all circumstances.  Changing the size of the
> file does not change what the driver does.  It just gets a request from
> the kernel for a SINGLE BLOCK, places it on the internal queue, sends it
> to the client daemon, waits for an ack, tells the kernel that buffer is
> now free. Repeat. That's all.

OK, well following my explanation here I followed up by doing some
heavy tests against a 4GB mount of a raw partition via localhost.
I used nbd-2.4.15 (driver same as nbd-2.2.29) compiled for kernel
2.2.15 SMP. The machine is an 875MHZ celeron with adaptec ULD 788x
scsi at 43MHz, Asus slot 1 board.

I was unable to provoke any problems by either raw read or write to
the nbd device. I should have been able to provoke a lockup by writing
while using the net for real if my theory is correct, but I couldn't
manage it. I used loopback to get the network card drivers out of the
picture .. they can lock up all on their own .. and loopback is in
theory more likely to provoke a memory deadlock because It uses memory
resources (yeah, we want to write to loopback to free up a few dirty
buffers, but guess what, loopback creates a dirty buffer which it
wants to write to the real disk BEFORE it will let us free ours).
Two loopback mounts ought to be deadly to human lifeforms.

So I mounted the filesystem  on the NBD device and wrote 512MB to it.
That locked up something! It turned out to be the kernel networking
stack. The _server_ daemon was in a state where it wouldn't accept kill
-9 as notice to die. It's a purely user space application, so clearly
the kernel had deadlocked somewhere in networking or VFS, since all the
dameon does is talk to the net and talk to its disk resource, by turns..
Everything else was vaguely OK.  In particular module and client daemon
were fine.  They rolled back waiting requests and tried to reconnect to
the dead server, respectively.

OK ..  so we know that one deadlock involves VFS or networking and
doesn't go directly through the driver.  Let's remount the thing
synchronously (mount -o remount,sync ..) to see if it's VFS.  Apparently
bingo, problem goes away:

cimbalo:/mnt/tmp% 5120+0 records in
5120+0 records out
[1]    Done                          dd if=/dev/zero of=zeros bs=102400
count=5120
0.040u 10.170s 2:35.31 6.5%     0+0k 0+0io 119pf+0w

so offhand it looks like a kernel deadlock involving the file system
layer here. Moral, don't let it stack up lots of pending writes to
a "virtual" device. It'll pay for it later when it tries to actually
write, and discovers that it costs something extra to do so!

Similar sorts of deadlock against networking must also be possible.

It's hard to know what to do about it .. I think I'd better figure out
kernel "plugging". I can throttle the device artificially in any
way I please, but the simplest approaches will slow it down
unneccessarily (e.g.  don't ever accept a request from the kernel until
we've sent out the one we were treating, which is overkill).  Really,
throttling should take place only when the kernel is getting low on
spare buffers.  I don't know how to get that feedback, or if one can.

In the meantime, just run "while sleep 1; sync; done" in the
background! Should work fine if the cpu's not too fast. Increase 1
to whatever your kernel can bear. This will pulse the thruput
horribly. One really wants to trickle feed buffers from the FS layer
down to the device continuously. Is there a FS tuning param for that?

I'll just try putting a limiter on the device .. say never accept more
than double the number requests than you have client daemons waiting.
I'll let you know if that also avoids deadlock possibilities.

Peter