[ENBD] 2.5.46 update

Peter T. Breuer enbd@lists.community.tummy.com
Tue, 12 Nov 2002 21:23:55 +0100 (MET)


"A month of sundays ago Tad Kollar wrote:"
> Using nosmp breaks things to the point where the kernel can't finish booting - 
Weird. What about the maxcpus=0 option?

> it gets to the hard drive check and locks up. If you still want me to try a 
> non-smp kernel, I'll recompile and see if that fares any better...

It would be worth it.

> > Yes. Try that. I'm not going to read the blkdev_dequeue function, but I 
> > suspect that might be it.
> 
> Okay, now there's an oops instead of a lockup:

Nice. Might give a clue ...

> NBD #4892[0]: nbd_init Network Block Device originally by pavel@elf.mj.gts.cz
> NBD #4893[0]: nbd_init Network Block Device port to 2.0 by ptb@it.uc3m.es
> NBD #4895[0]: nbd_init Network Block Device move networking to user space by 
> amarin@it.uc3m.es
> NBD #4897[0]: nbd_init Enhanced Network Block Device 2.4.30 $Date: 2002/11/04 
> 01:08:08 $ by ptb@it.uc3m.es
> NBD #4915[0]: nbd_init registered device at major 43
> NBD #3657[0]: nbd_find nbd_find called with part = 0x0
> NBD #3657[1]: nbd_find nbd_find called with part = 0x0
> NBD #3657[2]: nbd_find nbd_find called with part = 0x0
> NBD #2442[0]: nbd_set_sock increased socket count to 1
> NBD #2442[1]: nbd_set_sock increased socket count to 2
> NBD #3657[3]: nbd_find nbd_find called with part = 0x0
> NBD #3623[0]: nbd_media_changed nbd_media_changed called on nda
> NBD #3631[0]: nbd_revalidate revalidate called on nda
> NBD #3631[1]: nbd_revalidate revalidate called on nda
>   nda:------------[ cut here ]------------
> kernel BUG at drivers/block/ll_rw_blk.c:1476!

Aha! 1476 ...

          if (rl) {
                int rw = 0;

****            BUG_ON(!list_empty(&req->queuelist));

                list_add(&req->queuelist, &rl->free);

The queuelist in the req itself is not empty. That's weird.
That should be a couple of fields with space for a prev and next
pointer. and nonempty means that they don't point to the same struct
(trivial circle), but instead at something else, so the request things
it's on a list of some kind.

I'd like to know what list. I suppose one can't find out. 

The req's been off our slot queue, for a whole microsecond or so.
We just did "list_del (&req->queuelist);" immediately above it in
the nbd_commit code. Surely that's makes the list empty!

I don't understand how list_del can immediately be followed by
!list_empty  !!

If you can look at list.c too, maybe we might see how.

Oh. I see where the lockup came from before. blk_put_request
now takes the queue lock. So it was wrong to lock it also in
nbd_end_request_lock, or rather, it was wrong to call
nbd_end_request_lock. But I don't know if a lock is still required
around the other end_request operations.  At any rate, it is now
impossible to call end_request with the queue lock held, so 
all calls to nbd_end_request_lock should become nbd_end_request
instead.


Peter