[ENBD] nbd 2.4.27 recommended kernel
Peter T. Breuer
enbd@lists.community.tummy.com
Wed, 13 Mar 2002 02:54:46 +0100 (MET)
"A month of sundays ago Cicero Mota wrote:"
> Here is what I did, hope you can give some help
>
> dd if=/dev/zero of=/tmp/delme bs=1024 count=5000000
5GB?
> # /tmp/nbd-server 1025 /tmp/delme
> fourier:/tmp # nbd-server 2597: main server (-2) cannot find nbd-cstatd in
> /etc/services
You might want to do something about that. Or not.
> # insmod /tmp/nbd.o
> # /tmp/nbd-client localhost:1025 -n 1 /dev/nda
>
> server output was ok, I think:
Looks fine.
> nbd-server 2608: server (-1) sent/negotiated pulse interval 10 ok
> nbd-server 2608: server (-1) agreed 1 channels ok
> nbd-server 2608: server (-1) selected free port at 1026
> nbd-server 2608: server (-1) posted port 1026 ok
> nbd-server 2608: server (-1) manager started new process group 2608
> nbd-server 2610: server (0) set default signal handlers for slave server 2610
> nbd-server 2610: server (0) opened port 1026 (socket 6) for client 127.0.0.1
> nbd-server 2610: server (0) set new signal handlers for slave server 2610
> nbd-server 2610: newproto net errored on packet. Breaking off.
Except here we get a basic communication problem.
That's possible with mismatched server/client. Did you compile both of
them on the same machine? They must match the kernel module code
and each other.
> nbd-server 2608: server (-1) set new signal handlers for session server 2608
> nbd-server 2610: slavesighandler server (0) activates slave sighandler for
> signal 15
> nbd-server 2610: server (0) sighandler terminates slave 2610 safely
> nbd-server 2608: server (-1) relaunches child after SIGCHLD
> nbd-server 2608: server (-1) slave pid 2610 is down, launching new
> nbd-server 2611: server (0) set default signal handlers for slave server 2611
> nbd-server 2608: server (-1) launched slave pid 2611
Well, doesn't look good.
> client output also was ok:
> nbd-client 2606: client (-1) manager opened NBD device /dev/nda (2b00)
> nbd-client 2606: client (-1) set kernel bdflush sync boundary to 80% from 60%
> nbd-client 2606: client (-1) set kernel bdflush async boundary to 25% from
> 30%
> fourier:~ # nbd-client 2607: client (-1) starts introduction sequence on
> port 1025
> nbd-client 2607: client (-1) got session port 1026 ok
> nbd-client 2607: client (-1) introduction sequence ends ok
> nbd-client 2609: ok
> nbd-client 2609: client (0) opened socket 5 to port 1026
> nbd-client 2609: client (0) read passwd ok from port 1026
> nbd-client 2609: client (0) got cliserv magic ok from port 1026
> nbd-client 2609: client (0) got a signature ok from port 1026
> nbd-client 2609: client (0) begins main loop
Well, the other end wan't that happy with it. Doesn't it die?
> #mke2fs /dev/nda
But what does /proc/nbdinfo say? It's pointless proceding until the
device has been set up.
> mke2fs 1.24a (02-Sep-2001)
> Filesystem label=
> OS type: Linux
> Block size=1024 (log=0)
> Fragment size=1024 (log=0)
> 123464 inodes, 493764 blocks
> 24688 blocks (5.00%) reserved for the super user
> First data block=1
> 61 block groups
> 8192 blocks per group, 8192 fragments per group
> 2024 inodes per group
> Superblock backups stored on blocks:
> 8193, 24577, 40961, 57345, 73729, 204801, 221185, 401409
>
> Writing inode tables: done
> Segmentation fault
> ^^^^^^^^^^^^^^^
Looks like mismatched protocols to me. But we're already at points of the
test where the results are invalid, because things never really got
set up properly. I suspect the client caused a kernel oops, and after
that all bets are off.
> Mar 12 19:41:34 fourier kernel: NBD #2765[0]: nbd_set_sock setting unsigned
> device nda! But harmless.
> Mar 12 19:41:34 fourier kernel: NBD #2824[0]: nbd_set_sock increased socket
> count to 1
That was the client registering. Note that that is the 0'th time
through that line. It never went through there again. That means that
only one client registered. So how come your server died once and
reconnected?
> Mar 12 19:41:34 fourier kernel: Unable to handle kernel NULL pointer
You'd have to decode the oops for me to be sure where it's from. But
it looks to me as though this happened at registration time. Please try
to confirm that the oops happens before any packets are exchanged. If
you find it happens on exchanging packets, see if you can tell me if
its on read or write.
> dereference at virtual address 00000a33
0 plus 2600 or so. I don't have any structure that big.
> Mar 12 19:41:34 fourier kernel: printing eip:
> Mar 12 19:41:34 fourier kernel: c0114528
> Mar 12 19:41:34 fourier kernel: *pde = 00000000
> Mar 12 19:41:34 fourier kernel: Oops: 0002
> Mar 12 19:41:34 fourier kernel: CPU: 0
> Mar 12 19:41:34 fourier kernel: EIP:
> 0010:[interruptible_sleep_on_timeout+44/104]
Could be anywhere.
> Mar 12 19:41:34 fourier kernel: EFLAGS: 00210086
> Mar 12 19:41:34 fourier kernel: eax: 00000a2f ebx: 00200286 ecx: c4ccac28
> edx: c5779f00
> Mar 12 19:41:34 fourier kernel: esi: 000003e8 edi: c4cca3b0 ebp: c5779f08
> esp: c5779ef0
It's the trace part that you haven't shown that I need .. decoded.
So, in summary ... confirm that you compiled module server and client
on the same machine, from the same package. Then check when the oops
happens .. on registration or on read or on write. That'll tell me
more.
Peter