[ENBD] nbd-2.4.1 problems on kernel 2.2.18

Peter T. Breuer ptb@it.uc3m.es
Sat, 10 Mar 2001 01:10:51 +0100 (MET)


"A month of sundays ago Kai Chen wrote:"
> >>file: Can not seek locally to offset 2149299200!
> >>nbd-client-server: writenet exits FAIL
> 
> >Well, it means what it says. You compiled without large file support, or 
> >your system does not have large file support. Did you run the configure 
> >script ("make config")?
> 
> In the make file, I do have the following line:
> 
> CFLAGS     = -O2 -Wall -D_LARGEFILE64_SOURCE

That's nice, but not in itself "the business". You have to run make
config, so that the configure scripts can set up the Makefiles
in the lower directories. The data they look at comes from config.h, which
contains settings (like "HAVE_LSEEK64") that your system needs to be
probed for.

> I thought that meant I had compiled with the support, or was I mistaken....? 

Most certainly you are mistaken. That is a prerequisite, not a
symptom.

>   Or, how do I get large file support for my "system?"

You follow the instructions in the INSTALL file.
  
  It should be sufficient to type "make config all" in this directory.
  But check first

    0) that you have the kernel sources installed
    1) that the kernel source directory LINUXDIR in the Makefile is correct
    2) that you set SMP=1 in the Makefile if your target kernel is SMP.

  The make will build nbd.o, nbd-server, nbd-client in /tmp.  Change
  BUILD in the Makefile to change the build directory to somewhere else.

And may I add "please remove config.cache". Since these are snapshots,
taken automatically, I may well have accidently caught a local
config.cache in them!

> >I'm also interested to hear that the 2.4.20 code is working on the
> >2.2.18 kernel .. I haven't had a chance to do the regression testing
> >on 2.2.18 since completing the port to 2.4.0 in nbd 2.4.20.
> >
> >Which kernel are you actually using?
> 
> I'm using kernel 2.2.18.

OK. Nice to know it still works on that.

> >>On the client side, the system just spit out some error messages then 
> >>froze up completely.  The only way to fix it was to restart/reboot the 
> >>machine using the reset button.
> >
> >This should not happen - if it receives out of range requests then they 
> >should be rejected quite early on. But that said, I do not yet know all the 
> >possible kernel interactions in the 2.4 kernels, if you are using that, or 
> >perhaps I accidently got rid of the range check in the do_nbd_request loop. 
> >I'll look and see if the range check is still active.
> 
> Well the mkfs didn't stop even after the server was spitting "out of range 
> message."  In fact, the machine actually completed "mkfs" successfully (or 

Mkfs will die when it recieves error results back.

Yes, I am looking at the (2.4.22) driver code.  It will error
out-of-range requests before they ever get considered.  In the main loop
(do_nbd_request) that examines requests on the kernel queue and
transfers them to the driver queue there is:

                  if (req->sector + req->nr_sectors > (lo->size << 1)) {
                       NBD_FAIL ("overrange request");
                  }

so no request that is really too big can get into the driver.

That said, it seems to me that if the server negotiates the right size
but then finds it is unable to seek ... naw. Impossible. It has to seek
to the end  of the resource to find out how big it is. Either it can or
it can't.

> so it said...), but the file system the client thought created cannot be 
> mounted.  On the server, it shows:
> 
> nbd-client-server: writenet exits FAIL
> file: Can not seek locally to offset 2147483648!

2 147 483 648  is 2GB. Exactly. That's clearly a symptom of a lseek with
an integer return value (lseek returns a 32 bit signed int representing
the offset, so lseek can't seek to beyond 2GB as that's 2^31).

I suspect that you really haven't run make config. At least look in config.h
and see if the configured values are appropriate for your system.

> nbd/fileserver: short read to buffer offset 0, wanted 4096 got -22
> 
> The client sometimes did stop.  It shows:
> 
> nbd-client-netserver: client (0) short read from net to buffer  offset 0, 
> wanted 4096 got -110

That's saying it couldn't even succeed in negotiating the handshake.
The mkfs would never have run.

That's often a symptom of completely mismatched systems. Such as
compiling on a redhat 7 system and then trying to run on a redhat 6
or older.

> nbd-client: client (0) fails in expect sequence
> nbd-client: client (0) negotiation bails out on port 1100

Indeed.

> When the connection broke for any reason, automatic reconnection never 
> worked.  I always needed to stop the client altogether and then restart it.
> 
> Please advise.

Fix your compile.

And please tell me something about your systems! 

Peter