[ENBD] ENBD on 2.4.x

jona@orac.ensor.org jona@orac.ensor.org
Thu, 15 Mar 2001 10:14:21 -0700 (MST)


Hi gang!  More info on 2.4.1+ problems. :)

> Mounting with '-o sync' works LOTS better.  I actually get through
> the 'bonnie' test and get stats!  I think this has solved my problem.
>  
> > Probably vfs is full. File bigger than memory, etc. Is the fs mounted
> > sync? If not, mount it sync. If that doesn't help, load the module with
> > sync_intvl=1. That will flush VFS frequently, before it can ever fill
> > up (well, small ram, fast cpu will still let it block, but we hope
> > not).

I was hasty in saying that my problem was solved.  When I started using
a larger file to export, I started having problems again (even with a fresh
copy of nbd-2.4.21).

Even with the 'sync_intvl=1'
I deadlock sometimes just doing an 'mke2fs' (even before the filesystem
gets mounted).  I've also logged the output from 'nbd-client' and 'nbd-server'
and the message 

nbd-server: Read returned 0 after select predicted read!

Seems to be related to a message coming from the client:

nbd-client: manager sighandler received signal 17

It appears that the client daemon is dying unexpectedly and triggers
the server to notice it because of incomplete reads.

> > > Probably vfs is full. File bigger than memory, etc. Is the fs mounted
> > > sync? If not, mount it sync. If that doesn't help, load the module with
> > > sync_intvl=1. That will flush VFS frequently, before it can ever fill
> > > up (well, small ram, fast cpu will still let it block, but we hope
> > > not).
> > 
> > I'd like to better understand why the 'VFS' fills up and doesn't let
> > anybody know about it. (messages in printk, etc).  I'd also like to
> 
> I simply don't know how to measure when VFS is getting full. If you find
> out, tell me, and I'll gladly do something about it! But VFS fills up
> because that's what linux does. It's "buffering". I think I am getting
> some clues on how to turn it off, though.

I'm not sure I buy the idea that VFS is 'filling up' since I'm experiencing
problems even before the filesystem gets mounted.

Again, the '/proc/nbdinfo' file indicates that 'lo->queue' is non-empty.
when the 'deadlock' occurs.  I suspect that what is happening is that
the client unexpectedtly dies and thus leaves the 'lo->queue' full.

Device a:       Open
[a] State:      verify, rw, enabled, last error 0
[a] Queued:     +0R/0W curr reqs =0R/127W real reqs +1R/127W max reqs
[a] Buffersize: 86016   (sectors=168)
[a] Blocksize:  1024    (log=10)
[a] Size:       2048728KB
[a] Blocks:     2048728
[a] Sockets:    1       (*)
[a] Requested:  32160   (8017)  1R/32159W
[a] Despatched: 8017    (8017)  1R/8016W
[a] Errored:    0       (0)     0+0
[a] Pending:    0       (0)     0R/0W+0R/24143W
                                         ^^^^
                                         Number of blocks in 'lo->queue'
[a] Kthreads:   0       (0 waiting/0 running/1 max)
[a] Cthreads:   0       (-)
[a] Cpids:      0       (5975)

-Sincerely,

Jon Arney