[ENBD] Re: Enhanced NBD daemon on 2.4.3 kernel
Peter T. Breuer
ptb@it.uc3m.es
Wed, 18 Apr 2001 18:59:49 +0200 (MET DST)
"Jason A. Pattie wrote:"
> I'm having an interesting issue with the latest version of nbd
> (2.4.24). The previous version (2.4.23) worked to the extent that I
> could get it to work, although it would not do what I needed it to do.
?? What kernel?
> I used dd to create an image (using /dev/zero) of 32MB in size. I did
> an mke2fs on the image. I then launched nbd-server 7891 /tmp/test.img
> to serve the image. However, on the client, I could connect using
> nbd-client <ip>:7891 -n 1 /dev/nd/a, and that seemed to work fine, but
Why only "-n 1"? This is rather putting all your eggs in one basket.
> upon attempting to "mount /dev/nd/a/0 /tmp/mnt", the mount command just
> froze. There was some output on the server that looked pretty good, but
You have some compilation or other like issue. The server/client has been
holding up multi-gigabyte sized partitions here. Another possibility:
it may be an oversize request constructed by the kernel, but that's doubtful.
mke2fs and mount and so on are all part of the standard tests done on
each issue, so it's likely that the problem is local. What size is the
partition? (oh, it's a file. What size?) What kernel?
> nothing happened on the client. It was as if the connection just
> stopped. I haven't checked tcpdump or network traffic yet.
If this is to localhost, then it is a known (and inescapable) problem.
If it is over the net, you may have been bitten by request merging
issues in new kernels. See what size request the /proc/nbdinfo tells
you has been seen!
[a] Requested: 67 (16) (21) (16) (14) 67R/0W max 4
^^^^^^
Use merge_requests=0 as a module parameter. Also sync_intvl=1.
> The issue I'm having with 2.4.24, however, is entirely different.
2.4.24 hasn't been issued yet! But I'm glad to hear that somebody is
testing the snapshots :-). I think 2.4.23 is "official".
> Apparently, the module loads correctly, but when I issue an nbd-client
> ..., I get the following messages:
>
> # nbd-client <IP>:7891 -n 1 /dev/nd/a
> nbd-client: client (-1) manager opened NBD device /dev/nd/a (2b00)
> Segmentation fault
I was getting these when I first moved up to the 2.4.3 kernel from
2.4.0 a few days ago. It's likely to have been a memory alignment
problem or something related. After carefully looking around the code
and moving a few initializations up above their first use (:-), the
problem went away. It didn't show up under earlier kernels. As far as I
recall, it was a single problem in init_module, using the devnam field of
a still zero struct address to say "hi". And then when I turned on
debugging there were a few more instances in the debugging code.
You can diff the current 2.4.24 with yours to see what's changed.
> And I get dumped back to the prompt.
> If I rmmod the nbd module, it says that it invalidates and destroys the
> buffers for nda0 and nda1.
Good news.
> Any thoughts?
There is an error in 2.4.2 and 2.4.3 kernels that means that the module
won't currently work unless plug=1 is set. I discovered this also when
I moved up to the new kernel yesterday. They've lost two lines at the end
of __make_request() in ll_rw_blk.c that told requests to go straight to
the request_fn of an unplugged device, instead of waiting around on the
queue to be merged. I haven't checked the 2.4.1 kernel yet.
out:
- if (!q->plugged)
- (q->request_fn)(q);
if (freereq)
I've asked the kernel list what to do about it.
For the moment, use "plug=1", which plugs the device. I've changed the
default value to be 1 in 2.4 kernels.
Peter