[ENBD] kernel message "non existant block device"

Peter T. Breuer ptb@it.uc3m.es
Mon, 30 Oct 2000 20:14:25 +0100 (MET)


"A month of sundays ago Wang Gang wrote:"
> I have tried nbd 2.2.29 and 2.4.15 under kernel 2.2.16, but all failed.  When I
> make ext2 fs on a local nbd device, mke2fs run to "Write inode table", and
> then nbd 2.2.29 reported:
> NBD#476[0]: nbd_enqueue(0) would sched sync nda(0) at 31 req.

I forgot to tell you that this is a startup mesage. It doesn't come
from your problem. It's just saying that to remind me to think again
about fsyncing regularly from within the driver.

WRT your lockup, have you tried setting the trigger levels in the
plug/unplug code to zero? That is, in do_req, plug at the end if we
have any requests at all to treat (heck, you might as well just plug,
since we just got a request to treat). Then in the nbd_unplug_device
only run the kernel unplug_device when the number of waiting requests
or some other measure of busyness falls to zero.

I've been trying that on my 875MHz machine (I am pretty sure this is a
"speed" problem) with -o remount,sync and while do sleep 1; sync; done
going in the background and am consistently writing 500MB files to
localhost, with 128MB ram and 128MB swap.

What _does_ cause a lockup for me is launching two such writes in
parallel in this scenario. I would guess that the vfs semantics is
such that one of the writes builds up in the buffers while the other
is safely writing to disk.  Eventually it fills all buffers and tcp
locks as usual.

IMHO that's a vfs bug. The details might be slightly different from my
hypothesis, however, because I see the lockup after the first file
write finishes, and not before.

By the way, the semantics of launching two writes to the same file in
parallel on a -o sync mount is interesting. If you do it with -o async,
you see the two writes go to buffers, and only one of the resulting
"files" is sent to disk. If you do it -o sync, both writes go  to disk,
one after the other,

It's the second write which locks tcp on localhost mounts. It looks
as though it discharges to buffers all at once, just as though it were
async, and then the system tries to write them to disk. Too late.
Tcp died first.

I'll try actually slowing do_req done to the speed of the net. It
still requires -o sync, because the VFS layer is otherwise decoupled
from what we see.

Moral, you HAVE to use -o sync on a mirror.

Peter