[ENBD] fr1 hangs when trying to access raid device..

Peter T. Breuer enbd@lists.community.tummy.com
Thu, 6 Feb 2003 21:26:11 +0100 (MET)


"A month of sundays ago [Arve Emil Myr_s] wrote:"
> Ok,, made some printk's and found the spot where it burns,,, still running on a no-smp kernel with smp disabled in bios,,

Where? (no smp cannot fail - I will be interested to see this ..)

> Feb  6 16:54:19 vserv kernel: saw bh block 0 sector 0 size 1024 state 11d dev f000 rdev f000 on req f79113c0
> Feb  6 16:54:19 vserv kernel: submitting bh sector 0 size 1024 state 1e dev 700 rdev 700 on req f79113c0
> Feb  6 16:54:19 vserv kernel: serviced req f79113c0 on component 0
> Feb  6 16:54:19 vserv kernel: AEM: promote_req just after loop, req= f79113c0 , e= 0
> Feb  6 16:54:19 vserv kernel: AEM: promote_req just after atomic_set_mask, req= f79113c0 , e= 0
> Feb  6 16:54:19 vserv kernel: Unable to handle kernel NULL pointer dereference at virtual address 00000004



>  printk (KERN_DEBUG "AEM: promote_req just after atomic_set_mask, req= %x , e= %d\n", req, e->index);
> 
>         // PTB comes off e
>  list_del (&req->queue);
>  printk (KERN_DEBUG "AEM: promote_req just after list_del, req= %x , e= %d\n", req, e->index);



well, it dies in list_del. OK. But the req is not zero. OK. I think I
might understand this. The queue lock is still held too.

The pointers on the req queue must be messed up. I think I noticed that
the kernel people had at some point stopped tidying pointers after a
list del.


   /**
    * list_del - deletes entry from list.
    * @entry: the element to delete from the list.
    * Note: list_empty on entry does not return true after this, the
    * entry is in an undefined state.
    */
       static __inline__ void list_del(struct list_head *entry)
       {
               __list_del(entry->prev, entry->next);
       }

and we miss INIT_LIST_HEAD(&req->queue); afterwards.  Can you add that
after the list_del?  Or equivalently replace list_del with list_del_init
everywhere that list_del occurs.


This might be a simple thinko. Maybe we're not on a queue at all.
But try using list_del _int first.

If we're not on a queue at all, it might be wise to test with

        if (!list_empty(&req->queue)) list_del_init(&req->queue);

but see how the list_del_init helps first. I don't see the problem
here, but that must be good fortune. Or a different compiler.

Peter