[ENBD] enbd + fr1 question/explanation

Peter T. Breuer ptb at inv.it.uc3m.es
Thu May 18 10:07:49 MDT 2006


"Also sprach Ste:"
> kernel 2.4.31
> fr1 2.17

OK.

> I think that fr1 is used:
> root at data:~# lsmod
> Module                  Size  Used by    Not tainted
> ...
> fr1                    16784   1
> md                     48960   1  [fr1]
> bitmap                  4976   0  [fr1]

Looks like the module is loaded, but what makes you think that it is
doing the driving of the device you are using? Check what the kernel
says.

> I followed the instructions: first i created a raid 1 with the superblock.
> So /dev/md0 is okay. When i create the raid the first time there is a 
> "resync".

That's correct. It's nothing to do with fr1.

> However this resync can be bypassed with --dangerous-no-resync option. 
> So one problem could be solved.

Arrgh! You MUST sync the array when it is created. Why do you think that
two unequal array halves are OK? They aren't :(.

> Now, when i start the raid with "raidstart" now there is not a resync. 
> This is okay.

You should revise your notions of true and false!  You seem to have them
backwards here :-).

> But, if one of the disk goes offline, there is a total "recovery" 
> instead the recovery of the last changed blocks.

Check what the kernel says!  What makes you THINK there is?  When you
assert an opinion, you must back it with the evidence that leads you to
that opinion, and the reasoning you use to arrive at the judgment from
the observations, so that others may discuss it with you. Otherwise you
are saying "the moon is blue" and others are saying "no it isn't", and
nobody gets anywhere. If you expose your reasoning to scrutiny, you
will also become aware yourself of flaws that may have led you to an
erroneous belief (or possibly find your belief strengthened :).

In this case you haven't presented evidence for the opinion you
asserted above. The readme shows what the kernel should say:

   If you fault one device, then write to the device, then hotadd the
   faulted device back in, you should be able to see from the kernel
   messages (use "dmesg") that the resync is intelligent.  Here's some
   dmesg output:

     raid1 resync starts on device 0 component 1 for 1024 blocks
     raid1 resynced dirty blocks 0-9
     raid1 resync skipped clean blocks 10-1023
     raid1 resync terminates with 0 errs on device 0 component 1
     raid1 hotadd component 7:1[1] to device 0


> But it seems to don't work:

What makes you think so?

> root at data:/mnt# cat /proc/mdstat
> Personalities : [raid1]
> read_ahead 1024 sectors
> md0 : active raid1 nda[2] ndb[1]
>       104320 blocks [2/1] [_U]
>       [>....................]  recovery =  0.0% (828/104320) 
> finish=116.9min speed=13K/sec

This is what you should see (except for the speed!).  Why do you think
this is a resync that involves really writing instead of pretending to
write?  The problem that I see from that data is that whatever this is,
it is extremely slow!  You should be looking at why that is.  I'd be
seriously worried.  Have you set the resync maxima in /proc/sys/dev/blah
very low?

You haven't responded to my question: "what does the kernel say"?
That's what you want to be looking at! Use dmesg.

Check the readme again - it shows you what you should be seeing. In
particular you will see there that SHOULD see approximately what you
have reported seeing from where you are looking (I don't recall the
precise output format, but I can check the code ...  as can you!), but
that is not where you should be looking.  Check the kernel's messages
instead, with dmesg.

(I've just had a quick look at the patch, and I can't see any change 
made by it to the print output from /proc/mdtab, so I suppose what you
have reported is what should be seen)


Peter


More information about the ENBD mailing list