[ENBD] fr1 hangs when trying to access raid device..
Arve Emil Myrås
enbd@lists.community.tummy.com
Sun, 9 Feb 2003 14:55:04 +0100
>> Ok,, made some printk's and found the spot where it burns,,, still =
running on a no-smp kernel with smp disabled in bios,,
>
>Where? (no smp cannot fail - I will be interested to see this ..)
>
>> Feb 6 16:54:19 vserv kernel: saw bh block 0 sector 0 size 1024 state =
11d dev f000 rdev f000 on req f79113c0
>> Feb 6 16:54:19 vserv kernel: submitting bh sector 0 size 1024 state =
1e dev 700 rdev 700 on req f79113c0
>> Feb 6 16:54:19 vserv kernel: serviced req f79113c0 on component 0
>> Feb 6 16:54:19 vserv kernel: AEM: promote_req just after loop, =
req= f79113c0 , e= 0
>> Feb 6 16:54:19 vserv kernel: AEM: promote_req just after =
atomic_set_mask, req= f79113c0 , e= 0
>> Feb 6 16:54:19 vserv kernel: Unable to handle kernel NULL pointer =
dereference at virtual address 00000004
>> printk (KERN_DEBUG "AEM: promote_req just after atomic_set_mask, =
req= %x , e= %d\n", req, e->index);
>>
>> // PTB comes off e
>> list_del (&req->queue);
>> printk (KERN_DEBUG "AEM: promote_req just after list_del, req= %x =
, e= %d\n", req, e->index);
>well, it dies in list_del. OK. But the req is not zero. OK. I think I
>might understand this. The queue lock is still held too.
>The pointers on the req queue must be messed up. I think I noticed that
>the kernel people had at some point stopped tidying pointers after a
>list del.
> /**
> * list_del - deletes entry from list.
> * @entry: the element to delete from the list.
> * Note: list_empty on entry does not return true after this, the
> * entry is in an undefined state.
> */
> static __inline__ void list_del(struct list_head *entry)
> {
> __list_del(entry->prev, entry->next);
> }
>and we miss INIT_LIST_HEAD(&req->queue); afterwards. Can you add that
>after the list_del?
tried that, but it burns on the same spot..
>Or equivalently replace list_del with list_del_init
>everywhere that list_del occurs.
here somthing happends,, it does not oops anymore but i cant seem to get the same data out of the device that i put in:
vpn:~ # ssh vserv
Warning: the RSA host key for 'vserv' differs from the key for the IP address '10.10.0.2'
Offending key for IP in /root/.ssh/known_hosts:1
Matching host key in /root/.ssh/known_hosts:3
Are you sure you want to continue connecting (yes/no)? yes
root@vserv's password:
Last login: Sun Feb 9 14:24:15 2003 from vpn.local
Have a lot of fun...
vserv:~ # lsmod
Module Size Used by Not tainted
e1000 52300 2
fr1 20272 0 (autoclean)
md 59424 0 (autoclean)
pti_st 22912 0
sd_mod 10924 4
reiserfs 190768 1
FastTrak 96044 2
vserv:~ # rmmod md
vserv:~ # rmmod fr1
vserv:~ # lsmod
Module Size Used by Not tainted
e1000 52300 2
pti_st 22912 0
sd_mod 10924 4
reiserfs 190768 1
FastTrak 96044 2
vserv:~ # insmod fr1
Using /lib/modules/2.4.20ctx-16nsmp/kernel/drivers/md/fr1.o
vserv:~ # modprobe fr1
vserv:~ # losetup /dev/loop0 /root/loop0
vserv:~ # losetup /dev/loop1 /root/loop1
vserv:~ # mkraid --really-force /dev/fr10
DESTROYING the contents of /dev/fr10 in 5 seconds, Ctrl-C if unsure!
/dev/fr10: not an MD device!
handling MD device /dev/fr10
analyzing super-block
vserv:~ # lsmod
Module Size Used by Not tainted
md 59424 0 (autoclean)
loop 9560 2 (autoclean)
fr1 20240 0
e1000 52300 2
pti_st 22912 0
sd_mod 10924 4
reiserfs 190768 1
FastTrak 96044 2
vserv:~ # rmmod md
vserv:~ # lsmod
Module Size Used by Not tainted
loop 9560 2 (autoclean)
fr1 20240 0
e1000 52300 2
pti_st 22912 0
sd_mod 10924 4
reiserfs 190768 1
FastTrak 96044 2
vserv:~ # cat /proc/fr1stat
Personalities : [raid1]
read_ahead 4 sectors
fr10 : active fr1 [dev 07:00][0] [dev 07:01][1]
16000 blocks
vserv:~ # dd if=/dev/zero of=/dev/fr10 count=2
2+0 records in
2+0 records out
vserv:~ # dd if=/root/testfil1 of=/dev/fr10
1+1 records in
1+1 records out
vserv:~ # ll testfil*
-rw-r--r-- 1 root root 0 Feb 4 19:28 testfil
-rw-r--r-- 1 root root 514 Feb 7 15:40 testfil1
-rw-r--r-- 1 root root 514 Feb 7 15:43 testfil2
vserv:~ # dd if=/dev/fr10 of=/root/testfil2 bs=514 count=1
1+0 records in
1+0 records out
vserv:~ # ll testfil*
-rw-r--r-- 1 root root 0 Feb 4 19:28 testfil
-rw-r--r-- 1 root root 514 Feb 7 15:40 testfil1
-rw-r--r-- 1 root root 514 Feb 9 14:30 testfil2
vserv:~ # cat testfil1
ARVE EMIL1JALLAARVEEMIL2
vserv:~ # cat testfil2
DÀ@'Àÿÿÿÿÿÿÿÿùãà(ÀàPñT`EñT`EñÀ'À õ õt5-À¸`Eñ¸`Eñê`EñÐÀë㥤ÿþÿÿÿþÿÿ
«'Àÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿ
vserv:~ #
---
vserv:~ # tail -f /var/log/messages
Feb 9 14:20:40 vserv kernel: e1000: eth1 NIC Link is Up 1000 Mbps Full Duplex
Feb 9 14:20:40 vserv ntpdate[739]: sendto(192.168.250.2): Network is unreachable
Feb 9 14:20:41 vserv ntpdate[739]: no server suitable for synchronization found
Feb 9 14:20:41 vserv ntpd[751]: ntpd 4.1.1@1.786 Mon Sep 9 20:57:30 UTC 2002 (1)
Feb 9 14:20:41 vserv ntpd[751]: signal_no_reset: signal 13 had flags 4000000
Feb 9 14:20:41 vserv ntpd[751]: precision = 27 usec
Feb 9 14:20:41 vserv ntpd[751]: kernel time discipline status 0040
Feb 9 14:20:41 vserv /usr/sbin/cron[777]: (CRON) STARTUP (fork ok)
Feb 9 14:20:50 vserv kdm[740]: pam_unix2: session started for user root, service xdm
Feb 9 14:24:15 vserv sshd[986]: Accepted password for root from 10.10.0.1 port 32772 ssh2
Feb 9 14:24:34 vserv sshd[1006]: Accepted password for root from 10.10.0.1 port 32773 ssh2
Feb 9 14:25:54 vserv kernel: loop: loaded (max 8 devices)
Feb 9 14:26:22 vserv kernel: md: md driver 0.90.0 MAX_MD_DEVS=256, MD_SB_DISKS=27
Feb 9 14:26:27 vserv kernel: fr1 open device 240:0
Feb 9 14:26:27 vserv kernel: fr1 ioctl 40480923
Feb 9 14:26:27 vserv kernel: fr1 mkraid1 on device 0
Feb 9 14:26:27 vserv kernel: fr1 ioctl 40140921
Feb 9 14:26:27 vserv kernel: fr1 hotadd called to add for device 700
Feb 9 14:26:27 vserv kernel: fr1 create element 7:0 size 16000 blocks
Feb 9 14:26:27 vserv kernel: fr1 hotadd component 07:00[0] to device 0
Feb 9 14:26:27 vserv kernel: fr1 added new device 07:00 to f7ab7e00 with err 0
Feb 9 14:26:27 vserv kernel: fr1 ioctl 40140921
Feb 9 14:26:27 vserv kernel: fr1 hotadd called to add for device 701
Feb 9 14:26:27 vserv kernel: fr1 create element 7:1 size 16000 blocks
Feb 9 14:26:27 vserv kernel: fr1 hotadd component 07:01[1] to device 0
Feb 9 14:26:27 vserv kernel: fr1 added new device 07:01 to f7ab7e00 with err 0
Feb 9 14:26:27 vserv kernel: fr1 open device 240:0
Feb 9 14:26:27 vserv kernel: fr1 ioctl 400c0930
Feb 9 14:26:27 vserv kernel: fr1 calc component 7:0 size 16000 blocks
Feb 9 14:26:27 vserv kernel: fr1 calc component 7:1 size 16000 blocks
Feb 9 14:26:27 vserv kernel: fr1 calc size 16000 blocks
Feb 9 14:26:27 vserv kernel: fr1 close on device 240:0
Feb 9 14:26:27 vserv kernel: fr1 close on device 240:0
Feb 9 14:27:38 vserv kernel: fr1 open device 240:0
Feb 9 14:27:38 vserv kernel: got request f79111c0
Feb 9 14:27:38 vserv kernel: fr1 enqueue req f79111c0 for component 0
Feb 9 14:27:38 vserv kernel: scheduling get_req
Feb 9 14:27:38 vserv kernel: fr1 sched get_req OK
Feb 9 14:27:38 vserv kernel: fr1 service reqs routine will launch get_req on component 0
Feb 9 14:27:38 vserv kernel: fr1 get_req will dequeue req f79111c0
Feb 9 14:27:38 vserv kernel: saw bh block 0 sector 0 size 1024 state 11c dev f000 rdev f000 on req f79111c0
Feb 9 14:27:38 vserv kernel: submitting bh sector 0 size 1024 state 1e dev 700 rdev 700 on req f79111c0
Feb 9 14:27:38 vserv kernel: serviced req f79111c0 on component 0
Feb 9 14:27:38 vserv kernel: fr1 service reqs routine will launch get_req on component 1
Feb 9 14:27:38 vserv kernel: got request f79113c0
Feb 9 14:27:38 vserv kernel: fr1 enqueue req f79113c0 for component 0
Feb 9 14:27:38 vserv kernel: scheduling get_req
Feb 9 14:27:38 vserv kernel: fr1 sched get_req OK
Feb 9 14:27:38 vserv kernel: fr1 service reqs routine will launch get_req on component 0
Feb 9 14:27:38 vserv kernel: fr1 get_req will dequeue req f79113c0
Feb 9 14:27:38 vserv kernel: saw bh block 0 sector 0 size 1024 state 11d dev f000 rdev f000 on req f79113c0
Feb 9 14:27:38 vserv kernel: submitting bh sector 0 size 1024 state 1e dev 700 rdev 700 on req f79113c0
Feb 9 14:27:38 vserv kernel: serviced req f79113c0 on component 0
Feb 9 14:27:38 vserv kernel: AEM3: promote_req just after loop, req= f79113c0 , e= 0
Feb 9 14:27:38 vserv kernel: AEM3: promote_req just after atomic_set_mask, req= f79113c0 , e= 0
Feb 9 14:27:38 vserv kernel: AEM3: promote_req just after list_del_init, req= f79113c0 , e= 0
Feb 9 14:27:38 vserv kernel: AEM3: promote_req just after atomic_dec, req= f79113c0 , e= 0
Feb 9 14:27:38 vserv kernel: AEM3: promote_req just before unlock, req= f79113c0 , e= 0
Feb 9 14:27:38 vserv kernel: AEM3: promote_req just after unlock, req= f79113c0 , e= 0
Feb 9 14:27:38 vserv kernel: fr1 enqueue req f79113c0 for component 1
Feb 9 14:27:38 vserv kernel: AEM3: promote_req betveen fr1_enqueue and fr1_shed, req= f79113c0 , e= 0
Feb 9 14:27:38 vserv kernel: scheduling get_req
Feb 9 14:27:38 vserv kernel: fr1 sched get_req OK
Feb 9 14:27:38 vserv kernel: AEM3: promote_req just after fr1_shed, req= f79113c0 , e= 0
Feb 9 14:27:38 vserv kernel: promoted request f79113c0
Feb 9 14:27:38 vserv kernel: fr1 service reqs routine will launch get_req on component 1
Feb 9 14:27:38 vserv kernel: fr1 get_req will dequeue req f79113c0
Feb 9 14:27:38 vserv kernel: saw bh block 0 sector 0 size 1024 state 11d dev f000 rdev f000 on req f79113c0
Feb 9 14:27:38 vserv kernel: submitting bh sector 0 size 1024 state 1e dev 701 rdev 701 on req f79113c0
Feb 9 14:27:38 vserv kernel: serviced req f79113c0 on component 1
Feb 9 14:27:38 vserv kernel: fr1 service reqs routine will launch get_req on component 0
Feb 9 14:27:38 vserv kernel: fr1 service reqs routine will launch get_req on component 1
Feb 9 14:27:38 vserv kernel: fr1 close on device 240:0
Feb 9 14:27:38 vserv kernel: fr1 end_io will free bh sector 0 with 2 sectors left
Feb 9 14:27:38 vserv last message repeated 2 times
Feb 9 14:29:39 vserv kernel: fr1 open device 240:0
Feb 9 14:29:39 vserv kernel: got request f79111c0
Feb 9 14:29:39 vserv kernel: fr1 enqueue req f79111c0 for component 0
Feb 9 14:29:39 vserv kernel: scheduling get_req
Feb 9 14:29:39 vserv kernel: fr1 sched get_req OK
Feb 9 14:29:39 vserv kernel: fr1 service reqs routine will launch get_req on component 0
Feb 9 14:29:39 vserv kernel: fr1 get_req will dequeue req f79111c0
Feb 9 14:29:39 vserv kernel: saw bh block 0 sector 0 size 1024 state 11c dev f000 rdev f000 on req f79111c0
Feb 9 14:29:39 vserv kernel: submitting bh sector 0 size 1024 state 1e dev 700 rdev 700 on req f79111c0
Feb 9 14:29:39 vserv kernel: serviced req f79111c0 on component 0
Feb 9 14:29:39 vserv kernel: fr1 service reqs routine will launch get_req on component 1
Feb 9 14:29:39 vserv kernel: fr1 end_io will free bh sector 0 with 2 sectors left
Feb 9 14:29:39 vserv kernel: got request f79113c0
Feb 9 14:29:39 vserv kernel: fr1 enqueue req f79113c0 for component 0
Feb 9 14:29:39 vserv kernel: scheduling get_req
Feb 9 14:29:39 vserv kernel: fr1 sched get_req OK
Feb 9 14:29:39 vserv kernel: fr1 service reqs routine will launch get_req on component 0
Feb 9 14:29:39 vserv kernel: fr1 get_req will dequeue req f79113c0
Feb 9 14:29:39 vserv kernel: saw bh block 0 sector 0 size 1024 state 11d dev f000 rdev f000 on req f79113c0
Feb 9 14:29:39 vserv kernel: submitting bh sector 0 size 1024 state 1e dev 700 rdev 700 on req f79113c0
Feb 9 14:29:39 vserv kernel: serviced req f79113c0 on component 0
Feb 9 14:29:39 vserv kernel: AEM3: promote_req just after loop, req= f79113c0 , e= 0
Feb 9 14:29:39 vserv kernel: AEM3: promote_req just after atomic_set_mask, req= f79113c0 , e= 0
Feb 9 14:29:39 vserv kernel: AEM3: promote_req just after list_del_init, req= f79113c0 , e= 0
Feb 9 14:29:39 vserv kernel: AEM3: promote_req just after atomic_dec, req= f79113c0 , e= 0
Feb 9 14:29:39 vserv kernel: AEM3: promote_req just before unlock, req= f79113c0 , e= 0
Feb 9 14:29:39 vserv kernel: AEM3: promote_req just after unlock, req= f79113c0 , e= 0
Feb 9 14:29:39 vserv kernel: fr1 enqueue req f79113c0 for component 1
Feb 9 14:29:39 vserv kernel: AEM3: promote_req betveen fr1_enqueue and fr1_shed, req= f79113c0 , e= 0
Feb 9 14:29:39 vserv kernel: scheduling get_req
Feb 9 14:29:39 vserv kernel: fr1 sched get_req OK
Feb 9 14:29:39 vserv kernel: AEM3: promote_req just after fr1_shed, req= f79113c0 , e= 0
Feb 9 14:29:39 vserv kernel: promoted request f79113c0
Feb 9 14:29:39 vserv kernel: fr1 service reqs routine will launch get_req on component 1
Feb 9 14:29:39 vserv kernel: fr1 get_req will dequeue req f79113c0
Feb 9 14:29:39 vserv kernel: saw bh block 0 sector 0 size 1024 state 11d dev f000 rdev f000 on req f79113c0
Feb 9 14:29:39 vserv kernel: submitting bh sector 0 size 1024 state 1e dev 701 rdev 701 on req f79113c0
Feb 9 14:29:39 vserv kernel: serviced req f79113c0 on component 1
Feb 9 14:29:39 vserv kernel: fr1 end_io will free bh sector 0 with 2 sectors left
Feb 9 14:29:39 vserv kernel: fr1 end_io will free bh sector 0 with 2 sectors left
Feb 9 14:29:39 vserv kernel: fr1 service reqs routine will launch get_req on component 0
Feb 9 14:29:39 vserv kernel: fr1 service reqs routine will launch get_req on component 1
Feb 9 14:29:39 vserv kernel: fr1 close on device 240:0
Feb 9 14:30:39 vserv kernel: fr1 open device 240:0
Feb 9 14:30:39 vserv kernel: got request f79111c0
Feb 9 14:30:39 vserv kernel: fr1 enqueue req f79111c0 for component 0
Feb 9 14:30:39 vserv kernel: got request f7911340
Feb 9 14:30:39 vserv kernel: fr1 enqueue req f7911340 for component 0
Feb 9 14:30:39 vserv kernel: got request f79114c0
Feb 9 14:30:39 vserv kernel: fr1 enqueue req f79114c0 for component 0
Feb 9 14:30:39 vserv kernel: got request f79110c0
Feb 9 14:30:39 vserv kernel: fr1 enqueue req f79110c0 for component 0
Feb 9 14:30:39 vserv kernel: scheduling get_req
Feb 9 14:30:39 vserv kernel: fr1 sched get_req OK
Feb 9 14:30:39 vserv kernel: fr1 service reqs routine will launch get_req on component 0
Feb 9 14:30:39 vserv kernel: fr1 get_req will dequeue req f79111c0
Feb 9 14:30:39 vserv kernel: saw bh block 0 sector 0 size 1024 state 15c dev f000 rdev f000 on req f79111c0
Feb 9 14:30:39 vserv kernel: submitting bh sector 0 size 1024 state 1e dev 700 rdev 700 on req f79111c0
Feb 9 14:30:39 vserv kernel: serviced req f79111c0 on component 0
Feb 9 14:30:39 vserv kernel: fr1 get_req will dequeue req f7911340
Feb 9 14:30:39 vserv kernel: saw bh block 1 sector 2 size 1024 state 15c dev f000 rdev f000 on req f7911340
Feb 9 14:30:39 vserv kernel: submitting bh sector 2 size 1024 state 1e dev 700 rdev 700 on req f7911340
Feb 9 14:30:39 vserv kernel: serviced req f7911340 on component 0
Feb 9 14:30:39 vserv kernel: fr1 get_req will dequeue req f79114c0
Feb 9 14:30:39 vserv kernel: saw bh block 2 sector 4 size 1024 state 15c dev f000 rdev f000 on req f79114c0
Feb 9 14:30:39 vserv kernel: submitting bh sector 4 size 1024 state 1e dev 700 rdev 700 on req f79114c0
Feb 9 14:30:39 vserv kernel: serviced req f79114c0 on component 0
Feb 9 14:30:39 vserv kernel: fr1 get_req will dequeue req f79110c0
Feb 9 14:30:39 vserv kernel: saw bh block 3 sector 6 size 1024 state 15c dev f000 rdev f000 on req f79110c0
Feb 9 14:30:39 vserv kernel: submitting bh sector 6 size 1024 state 1e dev 700 rdev 700 on req f79110c0
Feb 9 14:30:39 vserv kernel: serviced req f79110c0 on component 0
Feb 9 14:30:39 vserv kernel: fr1 service reqs routine will launch get_req on component 1
Feb 9 14:30:39 vserv kernel: fr1 end_io will free bh sector 0 with 2 sectors left
Feb 9 14:30:39 vserv kernel: fr1 end_io will free bh sector 2 with 2 sectors left
Feb 9 14:30:39 vserv kernel: fr1 end_io will free bh sector 4 with 2 sectors left
Feb 9 14:30:39 vserv kernel: fr1 end_io will free bh sector 6 with 2 sectors left
Feb 9 14:30:39 vserv kernel: fr1 close on device 240:0
>This might be a simple thinko. Maybe we're not on a queue at all.
>But try using list_del _int first.
>If we're not on a queue at all, it might be wise to test with
>
> if (!list_empty(&req->queue)) list_del_init(&req->queue);
and when i try this it burns on the same spot as before again,,,
>but see how the list_del_init helps first. I don't see the problem
>here, but that must be good fortune. Or a different compiler.