[ENBD] New to ENBD, having troubles

Christopher Eveland enbd@lists.community.tummy.com
Tue, 22 Jan 2002 20:10:13 -0500


> You SURE you also tried 2.4.26? If you see the symptoms in BOTH 2.4.26
> and 2.4.27 then that rules out something I've done wrong in 2.4.27.

I downloaded nbd-2.4-current.tgz to start, which seems to be the same as the
nbd-2.4.26 that I just downloaded.  When I got it, I made a small patch to
get it to compile with the kernel (checked version for some macro
definition, saw it in the archives, I gather thats a difference between 26
and 26a).  Haven't tried 26a yet, but I can try it in the am.

> > Anyway, after doing the make, I do the make test, and go through the
> > checklist.  The module is loaded, I can see the server and
> client procs on
>
> Can you run mke2fs at that point?

Nope, hangs.

> > I can even do some small things with the device (I'm using
> /dev/ndf if this
>
> ndf? Hmmm ... I have never tried that. It is quite possible that there
> is a bug with higher numbers of device, simply _because_ I have never
> tried that There was once such a bug, I recall.

OK, I just redid it on /dev/nda and I get identical results.

> > case, since I had set up the others to auto mount on boot,
> obviously getting
> > ahead of myself... anyway, a-e are all turned off), like use dd
> to copy 512
> > bytes onto a device (such as /dev/hda3) and then compare to the
> original 512
> > bytes: they match.  But if I try to do something "big", like
> mkfs, it seems
> > to hang up.
>
> mke2fs is unique in that it tries to write _outside_ a partition, in
> order to do a binary search for its limits.
>
> > For instance, after doing "make test", I try "mke2fs /dev/ndf"
> as per the
> > online instructions.  As soon as I try mke2fs, I get the
> following on the
> > console:
>
> You are on the server side? And the server dies? I don't understand how
> you can be on the server side ...

I think the issue is that "make test" starts the server over ssh, so I'm
getting both the client and server messages dumped to the same console.

> > interact somewhat with the machine, I can't seem to shut it
> down nicely, to
> > get the load to go back down, I have to pretty much reset the machine.
>
> echo 0 > /proc/nbdinfo. Isn't this prominent enough in the man page?

Apparently not, but sure enough it works.  Thanks, and sorry for not reading
carefully enough.

> > So I'm having trouble interpreting this.  If anyone has some
> suggestions, or
> > can point me to something to look at, I'd appreciate it.  Thanks,
>
> It looks like a straightforward mistake of mine in the 2.4.27 server,
> dying instead of erroring out of range requests. May I ask WHAT you are
> serving? And why is the 2.4.26 server behaving the same way? Are you
> sure it is.
>
> Can you make sure that the device is some small size, like 8MB. If it's
> as straightforward as I believe, then its a simple case of a one line
> change in a user-level routine. What I'm surprised about is the lack of

I'm using the core files that you set up in the make test, seems to come to
8MB.

> diagnostics .... say! Are you using 2.4.26a or 2.4.26? Because 2.4.26a
> shares the driver with 2.4.27. So if both behave the same way, then
> the indication is that it is the driver. And I DO recall removing
> a range check from the driver ...

Seems to have been plain old 26.

> ummm, it would be in the code that takes stuff off the kernel queue ...
> do_nbd_request ... yes. It's still there!
>
>
>         PARANOIA_BEGIN;
>         if (lo->magic != NBD_DEV_MAGIC) {
>             NBD_DEBUG (1, "nd%s is not magical!\n", lo->devnam);
>             NBD_FAIL ("nbd[] is not magical!\n");
>         }
>         if (req->nr_sectors > lo->max_sectors) {
>             NBD_FAIL ("oversize request\n");
>         }
>         PARANOIA_END;
>         if (req->sector + req->nr_sectors > lo->sectors) {
>                   NBD_FAIL ("overrange request\n");
>         }
>
> Now I am stumped. Would you mind turning on the paranoia just above
> that range check? Then nothing too big can even get out of the client.
> But it should be just fine as it is, because the kernel guarantees
> not to pass anything bigger than the max_sectors we registered, and in
> any case mke2fs only passes overrange requests, not oversize requests.
> Maybe it's an ext2fs underrun?

I moved the PARANOIA_END line down below the last if, if thats what you
mean.  No difference that I could tell. ("rmmod nbd ; make ; make test",
still hangs.)

>
> Anyway, I need a bit more data AND I'll have a look at it tomorrow.
>
> You might try 2.4.26 instead of 2.4.26a (or vice versa). Any difference
> would give me a lead.

OK, I'll try out 26a in the morning too.  Thanks a lot for your help.

-Chris