[ENBD] kernel message "non existant block device"

Peter T. Breuer ptb@it.uc3m.es
Thu, 26 Oct 2000 21:00:30 +0200 (MET DST)


"A month of sundays ago Peter T. Breuer wrote:"
> "A month of sundays ago pbmonday@imation.com wrote:"
> > With kernel 2.2.17 I've had good luck "getting started" with NBDs, but I've

One thing you could try is locating your problem precisely. That means
exercising differnt kernel and module (and daemon) subsystems
separately.

For example, I am just doing a regression test.  I have mounted a 4GB
partition on a celeron SMP 256M 875MHz machine (with only one cpu in the
slot) under nbd to localhost. I am only using one daemon in order to
serialize the driver sequence.

I just compiled the kernel and module stuff fresh for 2.2.15-SMP.  I am
firstly stressing the kernel by doing a streaming read from /dev/nda
to /dev/null while running efsck on the partition in background.  Here's
a snapshot:

cimbalo:/usr/oboe/ptb/lang/c/nbd/nbd-2.4.15% cat /proc/nbdinfo
Device a:       Open
[a] State:      verify, rw, enabled, last error 0
[a] Queued:     +31 curr reqs/+31 real reqs/+64 max reqs
[a] Buffersize: 86016   (sectors=168)
[a] Blocksize:  1024    (log=10)
[a] Size:       4200966KB
[a] Blocks:     4200966
[a] Sockets:    1       (*)
[a] Requested:  17900   (17869) 17900R/0W
[a] Despatched: 17868   (17868) 17868R/0W
[a] Errored:    0       (0)
[a] Pending:    1       (1)     1R/0W+31R/0W
[a] Kthreads:   0       (0 waiting/0 running/1 max)
[a] Cthreads:   0       (-)
[a] Cpids:      0       (3127)
Device b-p:     Closed
cimbalo:/usr/oboe/ptb/lang/c/nbd/nbd-2.4.15% /sbin/e2fsck -nf /dev/nda
e2fsck 1.18, 11-Nov-1999 for EXT2 FS 0.5b, 95/08/09
Pass 1: Checking inodes, blocks, and sizes

cimbalo:/usr/oboe/ptb% free
             total       used       free     shared    buffers cached
Mem:        257264     255116       2148      10588    220500   5308
-/+ buffers/cache:      29308     227956
Swap:       265032       9140     255892

You can see that all kernel buffers are full, but I'm writing to
localhost so this won't provoke network usage at the same time.

There is no theoretical possibility of a lockup under these conditions. 
If one happens, it's a real bug, somewhere. Hardware or kernel. Not
mine.

When this test has finished I'll run write tests under short bursts,
then try and see if I can elicit what I suspect are inevitable kernel
lockups when all buffers are full and the network is in saturated use.
If I can't locate a lockup when the file system layers are not in use,
I'll involve them too.

But first locate a _stable_ situation, and then vary it, one parameter
at a time.

Ah, the efsck finished ...

e2fsck 1.18, 11-Nov-1999 for EXT2 FS 0.5b, 95/08/09
Pass 1: Checking inodes, blocks, and sizes
Pass 2: Checking directory structure
Pass 3: Checking directory connectivity
Pass 4: Checking reference counts
Pass 5: Checking group summary information
Block bitmap differences:  -6945 -6946 -6947 -6948 -6949 -270284
-270285 -270286 -270287 -904551 -904552 -904553 -904554 -904555 -904556
-904557
Fix? no

Peter