[ENBD] fr1 hangs when trying to access raid device..

Arve Emil Myrås enbd@lists.community.tummy.com
Sun, 9 Feb 2003 14:55:04 +0100


>> Ok,, made some printk's and found the spot where it burns,,, still =
running on a no-smp kernel with smp disabled in bios,,
>
>Where? (no smp cannot fail - I will be interested to see this ..)
>
>> Feb  6 16:54:19 vserv kernel: saw bh block 0 sector 0 size 1024 state =
11d dev f000 rdev f000 on req f79113c0
>> Feb  6 16:54:19 vserv kernel: submitting bh sector 0 size 1024 state =
1e dev 700 rdev 700 on req f79113c0
>> Feb  6 16:54:19 vserv kernel: serviced req f79113c0 on component 0
>> Feb  6 16:54:19 vserv kernel: AEM: promote_req just after loop, =
req= f79113c0 , e= 0
>> Feb  6 16:54:19 vserv kernel: AEM: promote_req just after =
atomic_set_mask, req= f79113c0 , e= 0
>> Feb  6 16:54:19 vserv kernel: Unable to handle kernel NULL pointer =
dereference at virtual address 00000004



>>  printk (KERN_DEBUG "AEM: promote_req just after atomic_set_mask, =
req= %x , e= %d\n", req, e->index);
>> 
>>         // PTB comes off e
>>  list_del (&req->queue);
>>  printk (KERN_DEBUG "AEM: promote_req just after list_del, req= %x =
, e= %d\n", req, e->index);



>well, it dies in list_del. OK. But the req is not zero. OK. I think I
>might understand this. The queue lock is still held too.

>The pointers on the req queue must be messed up. I think I noticed that
>the kernel people had at some point stopped tidying pointers after a
>list del.


>   /**
>    * list_del - deletes entry from list.
>    * @entry: the element to delete from the list.
>    * Note: list_empty on entry does not return true after this, the
>    * entry is in an undefined state.
>    */
>       static __inline__ void list_del(struct list_head *entry)
>       {
>               __list_del(entry->prev, entry->next);
>       }

>and we miss INIT_LIST_HEAD(&req->queue); afterwards.  Can you add that
>after the list_del?  

tried that, but it burns on the same spot..

>Or equivalently replace list_del with list_del_init
>everywhere that list_del occurs.

here somthing happends,, it does not oops anymore but i cant seem to get the same data out of the device that i put in:


vpn:~ # ssh vserv
Warning: the RSA host key for 'vserv' differs from the key for the IP address '10.10.0.2'
Offending key for IP in /root/.ssh/known_hosts:1
Matching host key in /root/.ssh/known_hosts:3
Are you sure you want to continue connecting (yes/no)? yes
root@vserv's password:
Last login: Sun Feb  9 14:24:15 2003 from vpn.local
Have a lot of fun...
vserv:~ # lsmod
Module                  Size  Used by    Not tainted
e1000                  52300   2
fr1                    20272   0  (autoclean)
md                     59424   0  (autoclean)
pti_st                 22912   0
sd_mod                 10924   4
reiserfs              190768   1
FastTrak               96044   2
vserv:~ # rmmod md
vserv:~ # rmmod fr1
vserv:~ # lsmod
Module                  Size  Used by    Not tainted
e1000                  52300   2
pti_st                 22912   0
sd_mod                 10924   4
reiserfs              190768   1
FastTrak               96044   2
vserv:~ # insmod fr1
Using /lib/modules/2.4.20ctx-16nsmp/kernel/drivers/md/fr1.o
vserv:~ # modprobe fr1
vserv:~ # losetup /dev/loop0 /root/loop0
vserv:~ # losetup /dev/loop1 /root/loop1
vserv:~ # mkraid --really-force /dev/fr10
DESTROYING the contents of /dev/fr10 in 5 seconds, Ctrl-C if unsure!
/dev/fr10: not an MD device!
handling MD device /dev/fr10
analyzing super-block
vserv:~ # lsmod
Module                  Size  Used by    Not tainted
md                     59424   0  (autoclean)
loop                    9560   2  (autoclean)
fr1                    20240   0
e1000                  52300   2
pti_st                 22912   0
sd_mod                 10924   4
reiserfs              190768   1
FastTrak               96044   2
vserv:~ # rmmod md
vserv:~ # lsmod
Module                  Size  Used by    Not tainted
loop                    9560   2  (autoclean)
fr1                    20240   0
e1000                  52300   2
pti_st                 22912   0
sd_mod                 10924   4
reiserfs              190768   1
FastTrak               96044   2
vserv:~ # cat /proc/fr1stat
Personalities : [raid1]
read_ahead 4 sectors
fr10 : active fr1 [dev 07:00][0] [dev 07:01][1]
      16000 blocks

vserv:~ # dd if=/dev/zero of=/dev/fr10 count=2
2+0 records in
2+0 records out
vserv:~ # dd if=/root/testfil1 of=/dev/fr10
1+1 records in
1+1 records out
vserv:~ # ll testfil*
-rw-r--r--    1 root     root            0 Feb  4 19:28 testfil
-rw-r--r--    1 root     root          514 Feb  7 15:40 testfil1
-rw-r--r--    1 root     root          514 Feb  7 15:43 testfil2
vserv:~ # dd if=/dev/fr10 of=/root/testfil2 bs=514 count=1
1+0 records in
1+0 records out
vserv:~ # ll testfil*
-rw-r--r--    1 root     root            0 Feb  4 19:28 testfil
-rw-r--r--    1 root     root          514 Feb  7 15:40 testfil1
-rw-r--r--    1 root     root          514 Feb  9 14:30 testfil2
vserv:~ # cat testfil1
ARVE EMIL1JALLAARVEEMIL2
vserv:~ # cat testfil2
DÀ@'Àÿÿÿÿÿÿÿÿùãà(ÀàPñT`EñT`EñÀ'À                 õ õt5-À¸`Eñ¸`Eñê`EñÐÀë㥤ÿþÿÿÿþÿÿ
                                                                                  «'Àÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿ
vserv:~ #

---

vserv:~ # tail -f /var/log/messages
Feb  9 14:20:40 vserv kernel: e1000: eth1 NIC Link is Up 1000 Mbps Full Duplex
Feb  9 14:20:40 vserv ntpdate[739]: sendto(192.168.250.2): Network is unreachable
Feb  9 14:20:41 vserv ntpdate[739]: no server suitable for synchronization found
Feb  9 14:20:41 vserv ntpd[751]: ntpd 4.1.1@1.786 Mon Sep  9 20:57:30 UTC 2002 (1)
Feb  9 14:20:41 vserv ntpd[751]: signal_no_reset: signal 13 had flags 4000000
Feb  9 14:20:41 vserv ntpd[751]: precision = 27 usec
Feb  9 14:20:41 vserv ntpd[751]: kernel time discipline status 0040
Feb  9 14:20:41 vserv /usr/sbin/cron[777]: (CRON) STARTUP (fork ok)
Feb  9 14:20:50 vserv kdm[740]: pam_unix2: session started for user root, service xdm
Feb  9 14:24:15 vserv sshd[986]: Accepted password for root from 10.10.0.1 port 32772 ssh2
Feb  9 14:24:34 vserv sshd[1006]: Accepted password for root from 10.10.0.1 port 32773 ssh2
Feb  9 14:25:54 vserv kernel: loop: loaded (max 8 devices)
Feb  9 14:26:22 vserv kernel: md: md driver 0.90.0 MAX_MD_DEVS=256, MD_SB_DISKS=27
Feb  9 14:26:27 vserv kernel: fr1 open device 240:0
Feb  9 14:26:27 vserv kernel: fr1 ioctl 40480923
Feb  9 14:26:27 vserv kernel: fr1 mkraid1 on device 0
Feb  9 14:26:27 vserv kernel: fr1 ioctl 40140921
Feb  9 14:26:27 vserv kernel: fr1 hotadd called to add for device 700
Feb  9 14:26:27 vserv kernel: fr1 create element 7:0 size 16000 blocks
Feb  9 14:26:27 vserv kernel: fr1 hotadd component 07:00[0] to device 0
Feb  9 14:26:27 vserv kernel: fr1 added new device 07:00 to f7ab7e00 with err 0
Feb  9 14:26:27 vserv kernel: fr1 ioctl 40140921
Feb  9 14:26:27 vserv kernel: fr1 hotadd called to add for device 701
Feb  9 14:26:27 vserv kernel: fr1 create element 7:1 size 16000 blocks
Feb  9 14:26:27 vserv kernel: fr1 hotadd component 07:01[1] to device 0
Feb  9 14:26:27 vserv kernel: fr1 added new device 07:01 to f7ab7e00 with err 0
Feb  9 14:26:27 vserv kernel: fr1 open device 240:0
Feb  9 14:26:27 vserv kernel: fr1 ioctl 400c0930
Feb  9 14:26:27 vserv kernel: fr1 calc component 7:0 size 16000 blocks
Feb  9 14:26:27 vserv kernel: fr1 calc component 7:1 size 16000 blocks
Feb  9 14:26:27 vserv kernel: fr1 calc size 16000 blocks
Feb  9 14:26:27 vserv kernel: fr1 close on device 240:0
Feb  9 14:26:27 vserv kernel: fr1 close on device 240:0
Feb  9 14:27:38 vserv kernel: fr1 open device 240:0
Feb  9 14:27:38 vserv kernel: got request f79111c0
Feb  9 14:27:38 vserv kernel: fr1 enqueue req f79111c0 for component 0
Feb  9 14:27:38 vserv kernel: scheduling get_req
Feb  9 14:27:38 vserv kernel: fr1 sched get_req OK
Feb  9 14:27:38 vserv kernel: fr1 service reqs routine will launch get_req on component 0
Feb  9 14:27:38 vserv kernel: fr1 get_req will dequeue req f79111c0
Feb  9 14:27:38 vserv kernel: saw bh block 0 sector 0 size 1024 state 11c dev f000 rdev f000 on req f79111c0
Feb  9 14:27:38 vserv kernel: submitting bh sector 0 size 1024 state 1e dev 700 rdev 700 on req f79111c0
Feb  9 14:27:38 vserv kernel: serviced req f79111c0 on component 0
Feb  9 14:27:38 vserv kernel: fr1 service reqs routine will launch get_req on component 1
Feb  9 14:27:38 vserv kernel: got request f79113c0
Feb  9 14:27:38 vserv kernel: fr1 enqueue req f79113c0 for component 0
Feb  9 14:27:38 vserv kernel: scheduling get_req
Feb  9 14:27:38 vserv kernel: fr1 sched get_req OK
Feb  9 14:27:38 vserv kernel: fr1 service reqs routine will launch get_req on component 0
Feb  9 14:27:38 vserv kernel: fr1 get_req will dequeue req f79113c0
Feb  9 14:27:38 vserv kernel: saw bh block 0 sector 0 size 1024 state 11d dev f000 rdev f000 on req f79113c0
Feb  9 14:27:38 vserv kernel: submitting bh sector 0 size 1024 state 1e dev 700 rdev 700 on req f79113c0
Feb  9 14:27:38 vserv kernel: serviced req f79113c0 on component 0
Feb  9 14:27:38 vserv kernel: AEM3: promote_req just after loop, req= f79113c0 , e= 0
Feb  9 14:27:38 vserv kernel: AEM3: promote_req just after atomic_set_mask, req= f79113c0 , e= 0
Feb  9 14:27:38 vserv kernel: AEM3: promote_req just after list_del_init, req= f79113c0 , e= 0
Feb  9 14:27:38 vserv kernel: AEM3: promote_req just after atomic_dec, req= f79113c0 , e= 0
Feb  9 14:27:38 vserv kernel: AEM3: promote_req just before unlock, req= f79113c0 , e= 0
Feb  9 14:27:38 vserv kernel: AEM3: promote_req just after unlock, req= f79113c0 , e= 0
Feb  9 14:27:38 vserv kernel: fr1 enqueue req f79113c0 for component 1
Feb  9 14:27:38 vserv kernel: AEM3: promote_req betveen fr1_enqueue and fr1_shed, req= f79113c0 , e= 0
Feb  9 14:27:38 vserv kernel: scheduling get_req
Feb  9 14:27:38 vserv kernel: fr1 sched get_req OK
Feb  9 14:27:38 vserv kernel: AEM3: promote_req just after fr1_shed, req= f79113c0 , e= 0
Feb  9 14:27:38 vserv kernel: promoted request f79113c0
Feb  9 14:27:38 vserv kernel: fr1 service reqs routine will launch get_req on component 1
Feb  9 14:27:38 vserv kernel: fr1 get_req will dequeue req f79113c0
Feb  9 14:27:38 vserv kernel: saw bh block 0 sector 0 size 1024 state 11d dev f000 rdev f000 on req f79113c0
Feb  9 14:27:38 vserv kernel: submitting bh sector 0 size 1024 state 1e dev 701 rdev 701 on req f79113c0
Feb  9 14:27:38 vserv kernel: serviced req f79113c0 on component 1
Feb  9 14:27:38 vserv kernel: fr1 service reqs routine will launch get_req on component 0
Feb  9 14:27:38 vserv kernel: fr1 service reqs routine will launch get_req on component 1
Feb  9 14:27:38 vserv kernel: fr1 close on device 240:0
Feb  9 14:27:38 vserv kernel: fr1 end_io will free bh sector 0 with 2 sectors left
Feb  9 14:27:38 vserv last message repeated 2 times
Feb  9 14:29:39 vserv kernel: fr1 open device 240:0
Feb  9 14:29:39 vserv kernel: got request f79111c0
Feb  9 14:29:39 vserv kernel: fr1 enqueue req f79111c0 for component 0
Feb  9 14:29:39 vserv kernel: scheduling get_req
Feb  9 14:29:39 vserv kernel: fr1 sched get_req OK
Feb  9 14:29:39 vserv kernel: fr1 service reqs routine will launch get_req on component 0
Feb  9 14:29:39 vserv kernel: fr1 get_req will dequeue req f79111c0
Feb  9 14:29:39 vserv kernel: saw bh block 0 sector 0 size 1024 state 11c dev f000 rdev f000 on req f79111c0
Feb  9 14:29:39 vserv kernel: submitting bh sector 0 size 1024 state 1e dev 700 rdev 700 on req f79111c0
Feb  9 14:29:39 vserv kernel: serviced req f79111c0 on component 0
Feb  9 14:29:39 vserv kernel: fr1 service reqs routine will launch get_req on component 1
Feb  9 14:29:39 vserv kernel: fr1 end_io will free bh sector 0 with 2 sectors left
Feb  9 14:29:39 vserv kernel: got request f79113c0
Feb  9 14:29:39 vserv kernel: fr1 enqueue req f79113c0 for component 0
Feb  9 14:29:39 vserv kernel: scheduling get_req
Feb  9 14:29:39 vserv kernel: fr1 sched get_req OK
Feb  9 14:29:39 vserv kernel: fr1 service reqs routine will launch get_req on component 0
Feb  9 14:29:39 vserv kernel: fr1 get_req will dequeue req f79113c0
Feb  9 14:29:39 vserv kernel: saw bh block 0 sector 0 size 1024 state 11d dev f000 rdev f000 on req f79113c0
Feb  9 14:29:39 vserv kernel: submitting bh sector 0 size 1024 state 1e dev 700 rdev 700 on req f79113c0
Feb  9 14:29:39 vserv kernel: serviced req f79113c0 on component 0
Feb  9 14:29:39 vserv kernel: AEM3: promote_req just after loop, req= f79113c0 , e= 0
Feb  9 14:29:39 vserv kernel: AEM3: promote_req just after atomic_set_mask, req= f79113c0 , e= 0
Feb  9 14:29:39 vserv kernel: AEM3: promote_req just after list_del_init, req= f79113c0 , e= 0
Feb  9 14:29:39 vserv kernel: AEM3: promote_req just after atomic_dec, req= f79113c0 , e= 0
Feb  9 14:29:39 vserv kernel: AEM3: promote_req just before unlock, req= f79113c0 , e= 0
Feb  9 14:29:39 vserv kernel: AEM3: promote_req just after unlock, req= f79113c0 , e= 0
Feb  9 14:29:39 vserv kernel: fr1 enqueue req f79113c0 for component 1
Feb  9 14:29:39 vserv kernel: AEM3: promote_req betveen fr1_enqueue and fr1_shed, req= f79113c0 , e= 0
Feb  9 14:29:39 vserv kernel: scheduling get_req
Feb  9 14:29:39 vserv kernel: fr1 sched get_req OK
Feb  9 14:29:39 vserv kernel: AEM3: promote_req just after fr1_shed, req= f79113c0 , e= 0
Feb  9 14:29:39 vserv kernel: promoted request f79113c0
Feb  9 14:29:39 vserv kernel: fr1 service reqs routine will launch get_req on component 1
Feb  9 14:29:39 vserv kernel: fr1 get_req will dequeue req f79113c0
Feb  9 14:29:39 vserv kernel: saw bh block 0 sector 0 size 1024 state 11d dev f000 rdev f000 on req f79113c0
Feb  9 14:29:39 vserv kernel: submitting bh sector 0 size 1024 state 1e dev 701 rdev 701 on req f79113c0
Feb  9 14:29:39 vserv kernel: serviced req f79113c0 on component 1
Feb  9 14:29:39 vserv kernel: fr1 end_io will free bh sector 0 with 2 sectors left
Feb  9 14:29:39 vserv kernel: fr1 end_io will free bh sector 0 with 2 sectors left
Feb  9 14:29:39 vserv kernel: fr1 service reqs routine will launch get_req on component 0
Feb  9 14:29:39 vserv kernel: fr1 service reqs routine will launch get_req on component 1
Feb  9 14:29:39 vserv kernel: fr1 close on device 240:0
Feb  9 14:30:39 vserv kernel: fr1 open device 240:0
Feb  9 14:30:39 vserv kernel: got request f79111c0
Feb  9 14:30:39 vserv kernel: fr1 enqueue req f79111c0 for component 0
Feb  9 14:30:39 vserv kernel: got request f7911340
Feb  9 14:30:39 vserv kernel: fr1 enqueue req f7911340 for component 0
Feb  9 14:30:39 vserv kernel: got request f79114c0
Feb  9 14:30:39 vserv kernel: fr1 enqueue req f79114c0 for component 0
Feb  9 14:30:39 vserv kernel: got request f79110c0
Feb  9 14:30:39 vserv kernel: fr1 enqueue req f79110c0 for component 0
Feb  9 14:30:39 vserv kernel: scheduling get_req
Feb  9 14:30:39 vserv kernel: fr1 sched get_req OK
Feb  9 14:30:39 vserv kernel: fr1 service reqs routine will launch get_req on component 0
Feb  9 14:30:39 vserv kernel: fr1 get_req will dequeue req f79111c0
Feb  9 14:30:39 vserv kernel: saw bh block 0 sector 0 size 1024 state 15c dev f000 rdev f000 on req f79111c0
Feb  9 14:30:39 vserv kernel: submitting bh sector 0 size 1024 state 1e dev 700 rdev 700 on req f79111c0
Feb  9 14:30:39 vserv kernel: serviced req f79111c0 on component 0
Feb  9 14:30:39 vserv kernel: fr1 get_req will dequeue req f7911340
Feb  9 14:30:39 vserv kernel: saw bh block 1 sector 2 size 1024 state 15c dev f000 rdev f000 on req f7911340
Feb  9 14:30:39 vserv kernel: submitting bh sector 2 size 1024 state 1e dev 700 rdev 700 on req f7911340
Feb  9 14:30:39 vserv kernel: serviced req f7911340 on component 0
Feb  9 14:30:39 vserv kernel: fr1 get_req will dequeue req f79114c0
Feb  9 14:30:39 vserv kernel: saw bh block 2 sector 4 size 1024 state 15c dev f000 rdev f000 on req f79114c0
Feb  9 14:30:39 vserv kernel: submitting bh sector 4 size 1024 state 1e dev 700 rdev 700 on req f79114c0
Feb  9 14:30:39 vserv kernel: serviced req f79114c0 on component 0
Feb  9 14:30:39 vserv kernel: fr1 get_req will dequeue req f79110c0
Feb  9 14:30:39 vserv kernel: saw bh block 3 sector 6 size 1024 state 15c dev f000 rdev f000 on req f79110c0
Feb  9 14:30:39 vserv kernel: submitting bh sector 6 size 1024 state 1e dev 700 rdev 700 on req f79110c0
Feb  9 14:30:39 vserv kernel: serviced req f79110c0 on component 0
Feb  9 14:30:39 vserv kernel: fr1 service reqs routine will launch get_req on component 1
Feb  9 14:30:39 vserv kernel: fr1 end_io will free bh sector 0 with 2 sectors left
Feb  9 14:30:39 vserv kernel: fr1 end_io will free bh sector 2 with 2 sectors left
Feb  9 14:30:39 vserv kernel: fr1 end_io will free bh sector 4 with 2 sectors left
Feb  9 14:30:39 vserv kernel: fr1 end_io will free bh sector 6 with 2 sectors left
Feb  9 14:30:39 vserv kernel: fr1 close on device 240:0
>This might be a simple thinko. Maybe we're not on a queue at all.
>But try using list_del _int first.

>If we're not on a queue at all, it might be wise to test with
>
>        if (!list_empty(&req->queue)) list_del_init(&req->queue);

and when i try this it burns on the same spot as before again,,,

>but see how the list_del_init helps first. I don't see the problem
>here, but that must be good fortune. Or a different compiler.