[ENBD] ENBD with 2.4.2-kernel
Peter T. Breuer
ptb@it.uc3m.es
Fri, 23 Mar 2001 23:32:57 +0100 (MET)
Ooops, .. let's try again ...
"Adrian Turcu wrote:"
> I have a strange problem here:
> - when I tried to lunch nbd-server and nbd-client on the same
> machine and make something with /dev/nda , like mkfs.ext2 -c /dev/nda or
I don't have 2.4.2, but the problem you describe is a typicsal lockup
vfs/vfs against localhost. It is probably curable by running with
sync_intvl=1 as a module parameter.
> something else, well, the utility is freezing and I will get a lot of errors in
> the server side. Here are what I do and after the error.
Current opinon among the memory management people is that there is no
way of avoiding vfs deadlock to localhost. It shows up on devices like
loop too. Don't Do That Then is the best advice I can give. If you are
dead set on trying your machines resilience, at least run while sync;
do : ; done in the background! That should compete with the mkfs well
enough to stop vfs locking.
Mind you, since you are on localhost, you can always mkfs the
underlying resource, THEN mount it, of course.
> -server (node1)
> nbd-server 4017 /dev/sdb -i "NBDabcdefNBD" -b 1024
> -client (node2)
> insmod -f /tmp/nbd.o show_errs=1 rahead=20 merge_requests=0 sync_intvl=10
> nbd-client localhost 4017 localhost -b 1024 -t 5 /dev/nda
> and after a mkfs.ext2 -c /dev/nda or raidstart /dev/md0 (which include the nda device)
> I'll get this in the log file (nbd-server file):
An interesting point would be if you knew this ONLY happened in the
2.4.2 kernel. Does it?
> nbd-server: server (-1) opened port #4017 on socket 1
> nbd-server: server (-1) read passwd ok
> nbd-server: server (-1) got cliserv magic ok
> nbd-server: server (-1) sent size 3784310784 ok
3.7GB. That's probably large enough for mkfs to be able to eat up
all your ram while running, hence deadlock when the kernel
asks nbd to flush some dirty buffers. They go to disk .. i.e. to
buffers. Deadlock.
> nbd-server: server (0) opened port #4018 on socket 5
> file: Can not seek locally to offset 2147500032!
Oh! Well, that's different. You have a compilation problem. There is
no support for >32 bit seeks in the seek you compiled in, Run
make distclean config.
> nbd-client: client (0) begins main loop
> nbd-client-netserver: client (0) short read from net to buffer offset 0, wanted 4096 got -110
And other very fundamental compile problems. It got an error when it
tries to read. The socket is probably dead. What is error 110 anyway?
> nbd-client-netserver: net_recv_reply exits INVAL for req type 0 with handle 0x0 len 4096 offset 3784245248
Oh, is that all. INVAL.
> [a] Errored: 0 (0) 0+0
> [a] Pending: 0 (0) 0R/0W+0R/0W
> [a] Kthreads: 0 (0 waiting/0 running/0 max)
> [a] Cthreads: 1 (+)
At this moment it seemed to be still happy. Curious! I guess various
things were stuffed from that trail above. Why running with only one
dameon, by the way? You'd need two channels to even have a slight hope
of avoiding deadlock.
> [a] Cpids: 1 (1084)
>
>
> and then, on the error phase, I can see the "Sockets" like:
>
> [a] Sockets: 1 (.)
Ah, OK.
> By the way, everythings are ending with a core file where
> I could saw a line like this one:
>
> nbd-cache/bitmap: client test of bitmap FAILED on mmaped NBD journal file for %d bytes from 0x%x
You are running with a cache enabled? Not according to yoru command
lines!
> Well, for my configuration it's vital to have the client and server running
> on the same machine at a time. If the server is running on a machine and
You can't. At least, not safely. It's a fundamental fact of the linux
kernel design. At least until someone tells me how to notice that vfs
is getting full and sync every OTHER device before we run out of spare
space. Well, it's the "getting full" bit that is hard. The rest I can
handle.
Still, setting sync_intvl=1 plug=0 merge_requests=0 in the module might
help.
> the client is running on another one, things looks fine, I meen no errors until now.
What were you looking at before? Is there any difference of behaviour
between nbd 2.4.21 2.4.22 2.4.23? Or between kernels 2.4.0 2.4.1 2.4.2?
> I'm using 2.4.2 kernel and nbd-2.4.21 on RedHat.
Peter