[ENBD] fr1 hangs when trying to access raid device..
Peter T. Breuer
enbd@lists.community.tummy.com
Sun, 9 Feb 2003 15:24:12 +0100 (MET)
"A month of sundays ago [Arve Emil Myr_s] wrote:"
> >Or equivalently replace list_del with list_del_init
> >everywhere that list_del occurs.
>
> here somthing happends,, it does not oops anymore but i cant seem to get the same data out of the device that i put in:
Well, I can't debug this at long distance. All I can say is that it
works here. If you can cause a change by swapping list_del by
list_del_init then it is a heisenbug, and probably exposed by compiler
differences (well, it's a bug, but one that needs hunting and I don't
have the compiler that exposes it). I am using 2.95.2.
gcc version 2.95.2 20000220 (Debian GNU/Linux)
I also have been working on inserting the technology into the standard
kernel raid1 driver, and I have it working, so there is no need to
battle with my emulation of the raid code in fr1 1.0. I wrote 2500
lines of kernel C code in one month, and it is to be expected that there
are bugs. I only wanted to get the principle right and then later work
on practical problems. It's cost me about the same amount of time to
insert the code into the existing driver, but there will be fewer bugs.
I was going to make the release later in the evening, but I'll put what I
have up for a quick look now.
ftp://oboe.it.uc3m.es/pub/Programs/fr1-2.0.tgz
This contains an amended md.o driver, and a fr1.o driver that replaces
raid1.o. In fact it's an amended raid1.o driver linked with the
bitmap.o object, which supplies the support.
Just edit the Makefile and type "make". I'm working on the make rule.
The change to md.o is required to allow hotadd directly after hotremove,
which signals a "hotrepair" and does the intelligent resync. It may be
possible to get the md.o code to recognize a repair by some permanent
identifier on the disk, but I haven't investigated. The kernel code is
confusing enough as it is. I'll ask Ingo Molnar, who at least replies
occasionally!
The changes to raid1.o are purely to insert the extra technology.
The only thing missing is that it doesn't wipe the bitmap as it syncs
but instead wipes the whole thing at the end, if it's successful. So
I suppose an aborted sync would start again. Or maybe not. The kernel
raid code alone knows. I think it probably starts where it left off,
which would be wrong.
Peter