[ENBD] 2.5.46 update
Peter T. Breuer
enbd@lists.community.tummy.com
Tue, 12 Nov 2002 21:23:55 +0100 (MET)
"A month of sundays ago Tad Kollar wrote:"
> Using nosmp breaks things to the point where the kernel can't finish booting -
Weird. What about the maxcpus=0 option?
> it gets to the hard drive check and locks up. If you still want me to try a
> non-smp kernel, I'll recompile and see if that fares any better...
It would be worth it.
> > Yes. Try that. I'm not going to read the blkdev_dequeue function, but I
> > suspect that might be it.
>
> Okay, now there's an oops instead of a lockup:
Nice. Might give a clue ...
> NBD #4892[0]: nbd_init Network Block Device originally by pavel@elf.mj.gts.cz
> NBD #4893[0]: nbd_init Network Block Device port to 2.0 by ptb@it.uc3m.es
> NBD #4895[0]: nbd_init Network Block Device move networking to user space by
> amarin@it.uc3m.es
> NBD #4897[0]: nbd_init Enhanced Network Block Device 2.4.30 $Date: 2002/11/04
> 01:08:08 $ by ptb@it.uc3m.es
> NBD #4915[0]: nbd_init registered device at major 43
> NBD #3657[0]: nbd_find nbd_find called with part = 0x0
> NBD #3657[1]: nbd_find nbd_find called with part = 0x0
> NBD #3657[2]: nbd_find nbd_find called with part = 0x0
> NBD #2442[0]: nbd_set_sock increased socket count to 1
> NBD #2442[1]: nbd_set_sock increased socket count to 2
> NBD #3657[3]: nbd_find nbd_find called with part = 0x0
> NBD #3623[0]: nbd_media_changed nbd_media_changed called on nda
> NBD #3631[0]: nbd_revalidate revalidate called on nda
> NBD #3631[1]: nbd_revalidate revalidate called on nda
> nda:------------[ cut here ]------------
> kernel BUG at drivers/block/ll_rw_blk.c:1476!
Aha! 1476 ...
if (rl) {
int rw = 0;
**** BUG_ON(!list_empty(&req->queuelist));
list_add(&req->queuelist, &rl->free);
The queuelist in the req itself is not empty. That's weird.
That should be a couple of fields with space for a prev and next
pointer. and nonempty means that they don't point to the same struct
(trivial circle), but instead at something else, so the request things
it's on a list of some kind.
I'd like to know what list. I suppose one can't find out.
The req's been off our slot queue, for a whole microsecond or so.
We just did "list_del (&req->queuelist);" immediately above it in
the nbd_commit code. Surely that's makes the list empty!
I don't understand how list_del can immediately be followed by
!list_empty !!
If you can look at list.c too, maybe we might see how.
Oh. I see where the lockup came from before. blk_put_request
now takes the queue lock. So it was wrong to lock it also in
nbd_end_request_lock, or rather, it was wrong to call
nbd_end_request_lock. But I don't know if a lock is still required
around the other end_request operations. At any rate, it is now
impossible to call end_request with the queue lock held, so
all calls to nbd_end_request_lock should become nbd_end_request
instead.
Peter