[ENBD] Serious problem with nbd-2.4.21 and linux kernel 2.4.0
Peter T. Breuer
ptb@it.uc3m.es
Sat, 10 Mar 2001 02:11:11 +0100 (MET)
"A month of sundays ago Christof [K_chli] wrote:"
> I would like to use the nbd functionality in my school project, but
> there is a problem I can not handle. I use a SuSE 7.1 distribution with
> a self compiled kernel version 2.4.0 (very patched from SuSE).
I am using kernel 2.4.0 and it works fine. Don't use a suse-derived
kernel. I have heard reports in the past that they do something funny
that stops other modules - nbd in particular - loading. But you seem to
have no problem.
> --------------------------------
> I load nbd module "insmod nbd":
> Using /lib/modules/2.4.0/kernel/drivers/block/nbd.o
OK.
> I start the server:
> nbd-server 2048 /tmp/delme -i abcdef654321
>
> Info about the file:
> -rwxrwxrwx 1 root root 191680512 Mar 7 08:01 /tmp/delme
191MB.
> I start the client:
> nbd-client localhost 2048 localhost /dev/nda
Well, a little more redundancy would be useful!
nbd-client localhost 2048 localhost localhost localhost localhost /dev/nda
> Info about the devices:
> brw-rw---- 1 root disk 43, 0 Mar 7 09:02 /dev/nda
> brw-rw---- 1 root disk 43, 1 Mar 7 09:02 /dev/nda1
fine.
> Output on the screen:
> nbd-client: client (-1) starts introduction sequence on port 2048
> nbd-client: client (-1) got session port 2049 ok
> nbd-client: client (-1) introduction sequence ends ok
> nbd-client: client (0) begins main loop
> nbd-server: server (-1) opened port #2048 on socket 1
> nbd-server: server (-1) read passwd ok
> nbd-server: server (-1) got cliserv magic ok
> nbd-server: server (-1) sent size 191680512 ok
> nbd-server: server (-1) sent sig ok
> nbd-server: server (-1) suggested ro flags 0 ok
> nbd-server: server (-1) received blksize 1024 ok
> nbd-server: server (-1) sent/negotiated blksize 1024 ok
> nbd-server: server (-1) received pulse_intvl 10 ok
> nbd-server: server (-1) sent/negotiated pulse interval 10 ok
> nbd-server: server (-1) agreed 1 channels ok
> nbd-server: server (-1) selected free port at 2049
> nbd-server: server (-1) posted port 2049 ok
> nbd-server: server (-1) manager started new process group 379
> nbd-server: server (0) opened port #2049 on socket 5
> nbd-server: server (-1) manager set CHLD USR1 USR2 HUP TERM signal
> handlers
OK.
> dmesg-output:
> NBD #2058[0]: nbd_set_sock device nda not signed yet!
No sweat. It was probably true.
> "cat /proc/nbdinfo"
> Device a: Open
> [a] State: verify, rw, enabled, last error 0
> [a] Queued: +0R/0W curr reqs =0R/0W real reqs +0R/0W max reqs
> [a] Buffersize: 86016 (sectors=168)
> [a] Blocksize: 1024 (log=10)
> [a] Size: 187188KB
> [a] Blocks: 187188
> [a] Sockets: 1 (*)
> [a] Requested: 0 (0) 0R/0W
> [a] Despatched: 0 (0) 0R/0W
> [a] Errored: 0 (0) 0+0
> [a] Pending: 0 (0) 0R/0W+0R/0W
> [a] Kthreads: 0 (0 waiting/0 running/0 max)
> [a] Cthreads: 1 (+)
> [a] Cpids: 1 (381)
> Device b-p: Closed
That shows a nice picture.
> I try to format the device:
> mke2fs -b 1024 /dev/nda 30000
You are the second person to say there is a problem with mke2fs and nbd
over 2.4.x. I must say I haven't seen any! But at least you compiled it
without any trouble.
> Screen output:
> mke2fs 1.19, 13-Jul-2000 for EXT2 FS 0.5b, 95/08/09
> Filesystem label=
> OS type: Linux
> Block size=1024 (log=0)
> Fragment size=1024 (log=0)
> 7520 inodes, 30000 blocks
> 1500 blocks (5.00%) reserved for the super user
> First data block=1
> 4 block groups
> 8192 blocks per group, 8192 fragments per group
> 1880 inodes per group
> Superblock backups stored on blocks:
> 8193, 24577
> Writing inode tables: done
> Segmentation fault
OK. That's clearly a kernel oops. I wonder why!
> The output of /proc/nbdinfo has not changed,
> but there is an error in dmesg-output:
> Unable to handle kernel NULL pointer dereference at virtual address
> 00000530
at 0x530! That's a most unusual address. It's hard to see how the kernel
can get that. Usually low addresses like that come from null
pointers (indeed that0s what it says!) and so the offset is the
offset of a field in a struct. But what struct is 0x530 bytes long!
> printing eip:
> e8994150
> *pde = 00000000
> Oops: 0000
> CPU: 0
> EIP: 0010:[<e8994150>]
> EFLAGS: 00010082
> eax: e353ff20 ebx: e353ff20 ecx: c0256f5c edx: 00000002
> esi: e38cdd28 edi: e38cc000 ebp: e38cdd08 esp: e38cdcf0
> ds: 0018 es: 0018 ss: 0018
> Process mke2fs (pid: 419, stackpage=e38cd000)
> Stack: 00000286 e38cdd28 e38cc000 00000000 c017298c 00000000 e38cde9c
> c0171ba4
> c0256f5c e38cdd28 c0115f74 c0256f5c e38cdd44 e3dd1da0 c0256fc0
> c0256fc0
> c012cfb6 c0205cec 00000001 00000001 e3dd1da0 00000000 e38cc000
> e3dd1dec
> Call Trace: [<c017298c>] [<c0171ba4>] [<c0115f74>] [<c012cfb6>]
> [<c0132526>] [<f [<ebf69ac5>] [<f724109b>] [<f3630fea>]
> [<c017f50b>] [<fffdd058>] [<c0126c ... ... .... (and a lot
> more)
You need to run that through ksymoops.
>
> I can kill nbd-server and nbd-client without a problem.
> Output "lsmod":
> Module Size Used by
> nbd 52592 1
That's the mkfs, I suppose.
> nls_cp437 4352 1 (autoclean)
>
> The command "echo -n 0 > /proc/nbdinfo" shows no change, several tries.
Well, it should show "disabled" in the nbdinfo output. Dmesg shoudl
also be informative. What IS the output from nbdinfo?
> The command "echo -n 1 > /proc/nbdinfo" is successful.
> Output "lsmod":
> Module Size Used by
> nbd 52592 0
> nls_cp437 4352 1 (autoclean)
>
> I try to unload the module.
> Output "rmmod nbd":
> Segmentation fault
That's to be expected if mkfs is still around.
> It is impossible to shutdown the machine as usual. I have to press the
> power button.
Use magic sysreq.
> My hope is, that there is only a small mistake, and it will be easy to
> correct.
> I have tried loads of things and nothing has changed the described
> situation a lot.
> What can I do? Thank you a lot in advance.
Don't do what you are doing. You cannot run nbd to localhost without
being in danger of a VFS/VFS deadlock. You are ensured a deadlock if
you run async and write more than ram at one go (in less than 30s,
anyway).
Try using merge_requests=0 and sync_intvl=1 in the module options.
The latter will force the device to sync itself every 1K requests
(or 1s, whichever is sooner), which might well prevent VFS buildup
- if you have at least 4MB of memory.
I'd really like a method of detecting when VFS is nearly full.
Anyone know one?
Peter