[ENBD] fr1 hangs when trying to access raid device..

Peter T. Breuer enbd@lists.community.tummy.com
Tue, 4 Feb 2003 13:46:03 +0100 (MET)


"A month of sundays ago [Arve Emil Myr_s] wrote:"
> I'm trying to get the fr1 module working on a dual-athlon system.

That's which fr1? 1.0 or 1.1 (I am on 1.3 here, so I need reminding).

> My kernel is 2.4.20 with the vserver patch version ctx-16 compiled for athlon cpu with smp.
> Everything goes fine until I try to acess the raid device.

Doesn't sound so fine!

> I have the fr1 loaded om major 9 and had raid working before i tryed this,

You loaded it with major=9.

> using my old raidtab who looks like: 
> 
> # autogenerated /etc/raidtab by YaST2 
> 
> 
> raiddev /dev/md0
>    raid-level       1
>    nr-raid-disks    2
>    nr-spare-disks   0
>    persistent-superblock 1

Do it without the superblock. I didn't take any account of those,
because I don't know the format or the semantics.

>    chunk-size       4096 
>    device   /dev/sdb1
>    raid-disk 0
>    device   /dev/sdb2
>    raid-disk 1
> 
> I can do a "mkraid --really-force /dev/md0" ; and everything looks normal..

Sounds good.

> But, when i try to "mke2fs  /dev/md0" ewerything just freezes (well not quite; Keyboard, mouse & X hangs; I can still ping the box but ssh-login is impossible) only way out is a hardware reset..

This is bad. And it's not showing up in the list below either ...

> My syslog looks like this form loading the module until "mke2fs":
> 
> Feb  4 12:37:18 vserv kernel: fr1 ioctl 800c0910
> Feb  4 12:37:18 vserv kernel: klogd 1.4.1, ---------- state change ----------
> Feb  4 12:37:18 vserv kernel: Cannot find map file.
> Feb  4 12:37:18 vserv kernel: Loaded 563 symbols from 6 modules.
> Feb  4 12:37:18 vserv kernel: fr1 ioctl 800c0910
> Feb  4 12:37:33 vserv last message repeated 2 times
> Feb  4 12:37:38 vserv kernel: fr1 ioctl 40480923
> Feb  4 12:37:38 vserv kernel: fr1 ioctl 40140921
> Feb  4 12:37:38 vserv kernel: fr1 hotadd component 08:11[0] to device 0
> Feb  4 12:37:38 vserv kernel: fr1 added new device 08:11 to f3b62600 with err 0
> Feb  4 12:37:38 vserv kernel: fr1 ioctl 40140921
> Feb  4 12:37:38 vserv kernel: fr1 hotadd component 08:12[1] to device 0
> Feb  4 12:37:38 vserv kernel: fr1 added new device 08:12 to f3b62600 with err 0
> Feb  4 12:37:38 vserv kernel: fr1 ioctl 400c0930

And then no messages. At this point it has simply set up the device.
Tell me, did you compile the driver with the -D__SMP__ directive? I
am tempted to say that that is an SMP specific lockup.

> Hope someone could help me out of this deadlock...
> If you need any more info just ask...

Can you try on a nonsmp machine, or boot with "nosmp", and see if it
makes a difference? Your symptoms match what I would expect if the
kernel iolock had been taken and not released. That would only happen
if the kernel and the driver had mismatched expectations over locking
conventions.

Peter