[ENBD] Re: enbd patch for 2.6.8 kernel

P.T. Breuer ptb at it.uc3m.es
Mon Sep 20 10:12:52 MDT 2004


In article <414EFC5E.6020704 at pydo.org> you wrote:
> > I would be grateful if somebody would tell me if my dead-reckoning 
> > really makes the patch apply and compile, since I don't want to tie up
> > my home connection for some hours downloading the 2.6.8 kernel source.

> I'm testing this on 2.6.8 kernel and Debian Sarge (SMP with 1 P4 HT

OK, thanks! It compiled - I guess that's all I can ask of dead
reckoning.

> CPU). Compiled the binaries and applied the patch to the kernel.

> This is something i found strange. When i stop enbd-client the device
> is still open. Is there something i can do to close the device ?

I'm not quite sure what you mean by that, but yes, I remember noticing
some funnies when trying to shut down (or return from hibernation on my
laptop) under 2.6 kernels.  I sometimes saw the client in a stuck
position waiting on a semaphore, not in my code.  A "echo 0 >
/proc/nbdinfo" clears the condition and allows everything to go on, but
I don't know what it is. Only time will produce the reason and the cure
...

> This is what i really do :

> # mdadm -S /dev/md0

Stop md. Now you want to kill clients.

> # echo -n "0" > /proc/nbdinfo

Yes, that's a clear. It'll only turn the device off for a few seconds,
however.

> # killall enbd-client
> # cat /proc/nbdinfo
> Device a:       Open
> [a] State:      verify, rw, disabled, remote invalid, show_errs, md5sum, 
> last error 0, lives 4, bp 0
> [a] Queued:     +0R/0W curr (check 0R/0W) +28R/27W max
> [a] Buffersize: 0       (sectors=512, blocks=64)
> [a] Blocksize:  4096    (log=12)
> [a] Size:       230460424KB
> [a] Blocks:     57615106
> [a] Sockets:    4       (-)     (-)     (-)     (.)
> [a] Requested:  56.444M (14.0M) (14.0M) (14.1M) (14.1M) 376.6KR/56.07MW 
> max 16
> [a] Despatched: 56.444M (14.0M) (14.0M) (14.1M) (14.1M) 376.6KR/56.07MW 
> md5 56.0MW (56.0M eq, 12 ne, 0 dn)
> [a] Errored:    48      (16)    (16)    (16)    (0)     48+0
> [a] Pending:    0       (0)     (0)     (0)     (0)     0R/0W+0R/0W
> [a] B/s now:    0       (0R+0W)
> [a] B/s ave:    3.99G   (228KR+3.99GW)
> [a] B/s max:    1.98G   (3.30GR+2.67GW)
> [a] Spectrum:   99%16
> [a] Kthreads:   0       (0 waiting/0 running/1 max)
> [a] Cthreads:   0       (-)     (-)     (-)     (-)
> [a] Cpids:      0       (0)     (0)     (0)     (0)

> By the way, a second 'echo -n "0" > /proc/nbdinfo' oops the kernel a
> little bit. :)

Does it? That's a good clue! 

> kernel: c01517b9
> kernel: SMP
> kernel: Modules linked in: ohci1394 ieee1394 uhci_hcd ehci_hcd raid1 md 
> enbd_ioctl enbd rtc unix
> kernel: CPU:    1
> kernel: EIP:    0060:[invalidate_bdev+16/41]    Not tainted
> kernel: EFLAGS: 00010286   (2.6.8)
> kernel: EIP is at invalidate_bdev+0x10/0x29

Well, why isn't that noted in the stack trace below?

> kernel: eax: 00000000   ebx: 00000000   ecx: 00000000   edx: f76564dc
> kernel: esi: 00000000   edi: 00000000   ebp: 00000001   esp: f7abdec4
> kernel: ds: 007b   es: 007b   ss: 0068
> kernel: Process bash (pid: 1877, threadinfo=f7abc000 task=f79a5290)
> kernel: Stack: c011991e 00000000 c01680f4 00000000 00000000 00000000 f88e86d0 f88e6f60
> kernel:        f88d8468 00000000 00000000 00000004 f88ded8f f88e6f60 00000001 00000001
> kernel:        f88de9b4 f88e6f60 00000282 f7fd4c00 000000d0 00008241 c015090b f79e9280
> kernel: Call Trace:
> kernel:  [printk+325/374] printk+0x145/0x176
> kernel:  [__invalidate_device+85/113] __invalidate_device+0x55/0x71

Well, as you can see, it's in the generic kernel code. Nothing to do
with me, I think! It's actually in a printk in __invalidate_device! And
it oopsed!

> kernel:  [__crc_tcp_protocol+4651978/4879841] enbd_soft_reset+0xbc/0x149 [enbd]
> kernel:  [__crc_tcp_protocol+4677910/4879841] enbd_write_proc+0x369/0x384 [enbd]
> kernel:  [get_empty_filp+95/214] get_empty_filp+0x5f/0xd6
> kernel:  [dentry_open+235/473] dentry_open+0xeb/0x1d9
> kernel:  [proc_file_write+0/66] proc_file_write+0x0/0x42
> kernel:  [proc_file_write+55/66] proc_file_write+0x37/0x42
> kernel:  [vfs_write+176/281] vfs_write+0xb0/0x119
> kernel:  [sys_write+81/128] sys_write+0x51/0x80
> kernel:  [syscall_call+7/11] syscall_call+0x7/0xb

Yes, all perfectly innocuous. I have no idea what is going on.

> kernel: Code: 8b 43 04 8b 5c 24 04 8b 80 a0 00 00 00 89 44 24 0c 83 c4 08

Uh, I guess I ought to disassemble that. How did I do that ..  uh ..

Code;  00000000 Before first symbol
   0:   8b 43 04                  mov    0x4(%ebx),%eax
Code;  00000003 Before first symbol
   3:   8b 5c 24 04               mov    0x4(%esp,1),%ebx
Code;  00000007 Before first symbol
   7:   8b 80 a0 00 00 00         mov    0xa0(%eax),%eax
Code;  0000000d Before first symbol
   d:   89 44 24 0c               mov    %eax,0xc(%esp,1)
Code;  00000011 Before first symbol
  11:   83 c4 08                  add    $0x8,%esp

Completely unevocative. I have no idea what that is.

Yep. I am still in the dark. Any ideas?

Peter


More information about the ENBD mailing list