[ENBD] ENBD on 2.4.x
Peter T. Breuer
ptb@it.uc3m.es
Fri, 16 Mar 2001 09:55:37 +0100 (MET)
Wups ... let's try replying again, this time with DISPLAY set right ...
"jona@orac.ensor.org wrote:"
> Could somebody sanity check me here? I'm still looking at issues
> with 2.4.x and ENBD and wanted to see how the Kernel module
> makes sure it doesn't merge requests to the point that it would
> overrun the client's buffer during a 'get_req' call (I suspect that
> I'm getting a SEGV in the ioctl(fd, MY_NBD_GET_REQ, 1000) call).
The module itself doesn't merge requests. It's the kernel that merges
requests before they get to the module. The module can make it
impossible for the kernel to do that, however, and by default the
module does "turn merge requests off" in the kernel.
That's because locality issues aren't relevant when running over the
net. There is benefit from using larger requests in terms of
overhead - both sides of the net. There are also locality issues at
the other end, in the servers kernel. But that kernel should look
after its own.
> I see that the Kernel module tries to keep that from happening
> by looking at 'buf_sectors', but something still seems to be
> going wrong.
Well, buf_sectors is a max that's set up somewhere at init time.
It's the min size of the (userspace) buffers that have been handed down to
us by the client daemons.
> At line 1468 in nbd.c (kernel module), I added the following
> test and warning if the kernel module tries to write a larger buffer
> than the user has allocated. I'm not sure how 'request.len' is
> getting so big, but I got the warning when I tried to run my 'mke2fs' on
> the block device.
>
> if (request.len > NBD_MAX_SECTORS*512+1000) {
> NBD_ALERT("Warning! Want to write something larger than my buff\
> er %i\n", request.len);
> }
You get 128K as the request.len. And it's evidently a write request, as
we are writing to the user buffer. So we are reading the request.len
from the incoming kernel request. Indeed we are:
request.len = req->nr_sectors;
request.len <<= 9;
So basically the incoming kernel request had a lot of sectors in it.
That's the basic imposibility. A kernel write request for 128K is
impressive, but it shouldn't have even been born.
Now, if requests that big are arriving, I would like to know why.
In theory, we have stopped the kernel from coalescing requests
by inserting our own front_merge and back_merge functions which
say "no" always. This was done at module init time. Can you check
that those substitutions are still in your code?
Once the kernel goes ahead and makes a request that big, there is very
little we can do about it. Is there any limit? My impression was that
128 sectors was the limit, but apparently 256 is possible. Let's see ..
the kernel can't coalesce more than the total number of requests in
the device pool, and I think that's 256. At 1KB per block, that's
256KB. I believe that writes can only take up 2/3 of the pool, but
never mind. One simply has to be prepared to treat or error such
requests, if we haven't succeeded in stopping them from being born!
I would increase the buffer size to 256K plus a bit. And add a
request size check in the basic do_nbd_request loop if there is not
already a check there.
Set NBD_MAX_SECTORS to 512 and recompile everything.
Add to the sanity tests in do_nbd_request
if (req->nr_sectors > buf_sectors) {
NBD_FAIL ("oversize request");
}
> else {
> size = copy_to_user_from_req(req,
> (char*)user+sizeof (struct nbd_request), request.len);
>
> // PTB we have been tracking where to write to in the buffer
> buflen += size;
> }
>
> Log dump:
>
> Mar 15 18:40:04 jarney kernel: NBD #3481[0]: init_module registered device at major 43
> Mar 15 18:40:18 jarney kernel: NBD #1892[0]: nbd_set_sock device nda not signed yet!
> Mar 15 18:40:27 jarney kernel: NBD #1469[0]: nbd_get_req Warning! Want to write something larger than my buffer 131072
>
> It would seem that we would want to do one of the following:
> a) Make sure the request NEVER gets bigger than NBD_MAX_SECTORS
> b) Break up large requests to the client daemon and inform it
> that it only got a partial.
>
> The first is probably the simplest, but I'm not sure how we can
> limit the size of requests.
It's also possible to skip the request, then go and ask the user
for a bigger buffer. Then treat the request.
Requests are fundamentally "blksized". The device has a block size that
it registers with the kernel at init time. It is the kernels duty
to honour that information and not issue us with random sized requests.
It can coalesce requests - if we allow it to, but we should have
succeeded in telling it not to.
Another thing that would stop the kernel making bigger requests is to
turn off plugging. Sigh ...
Uncomment the following code næar the module init
//static void
//nbd_plug_device_noop(request_queue_t *q, kdev_t dev) { }
and this too
// PTB this next disables the standard plugging WATCHME FIXME
//blk_queue_pluggable(BLK_DEFAULT_QUEUE(MAJOR_NR), nbd_plug_device_noop);
and let me know what happens.
Peter