[ENBD] Doesn't work for me (yet)

Peter T. Breuer ptb@it.uc3m.es
Thu, 2 Nov 2000 18:21:05 +0100 (MET)


"A month of sundays ago Dan Melomedman wrote:"
> Hi. I am new here, here's my problem. I have two SMP Intel systems that
> I would love to peruse ENBD on. I compiled 2.4.14 with 2.2.17 kernel
> with D__SMP__, however, on server load I get this warning : nbd-server:

What does that mean? Did you make sure that the module compilation was
passed -D__SMP__? There is a bug in the 2.4.14 makefile that means that
it isn't passed. You have to add it by hand.

(btw, I have no experience of smp machines, and still no smp testbed,
although I am expecting one daily)

> notice: setsockopt RCVTIMEO failed with Protocol not available. Is this
> relevant at all?

No.

> The client machine freezes when I do mkfs on /dev/nda (which is a 36 MB

Why would you want to! Do mkfs at the server end, please. It's crazy to
do anything else. Or do you enjoy passing random zeros over the net for
the fun of it? I rather suspect that mkfs if not given a size on the
commandline  also does a binary search for the size by passing
out-of-bounds requests, which could well influence the result.
If you can, strace mkfs and let me know what unusual things it is
doing.

> file on the server system). I also recompiled enbd for a non-SMP
> machine, and tried a non-SMP machine as the client, in that case there's
> no freezing, but here's what I get:
> 
> Superblock backups stored on blocks: 
>         8193, 24577
> 
> Writing inode tables: done                            
> Warning: could not erase block 0: Attempt to write block from filesystem
> resulted in short write
> Writing superblocks and filesystem accounting information: done
> 
> Where can I get 2.4.15 tarball, or what can I do to make 2.4.14 work?

There is no tarball. You have to snapshot the source at
http://www.it.uc3m.es/ptb/nbd/src/.

Tell me what doesn't work, in _detail_.  That means passing on the
output from /proc/nbdinfo.  Plus sufficient debugging information from
the logs at either side to make a diagnosis.

For the moment, I'd just say "don't do that then".  If you run mkfs on a
machine that has less than 100MB of memory, say, and the mkfs must pass
100MB of zeros across the net, say, then it WILL freeze due to a vfs/tcp
deadlock.  That's because those bytes will be buffered first and later
passed through to the device, but the device needs more memory in order
to empty the buffers out onto the net ..  vfs/tcp deadlock.  The only
cure is to mount the filesystem synchronously, but you can't do that if
you haven't got a filesystem on it yet!  Or slow down your cpu.

In kernel 2.4.* raw devices (like /dev/nda) have been unbuffered, so you
can't get this problem.


Peter