[ENBD] enbd + fr1 question/explanation
Ste
ste at phx6a.ath.cx
Thu May 18 11:23:44 MDT 2006
Peter T. Breuer wrote:
> Looks like the module is loaded, but what makes you think that it is
> doing the driving of the device you are using? Check what the kernel
> says.
>
Infact it seems that the driver isn't driving anything. (read ahead) :-)
>
> Arrgh! You MUST sync the array when it is created. Why do you think that
> two unequal array halves are OK? They aren't :(.
>
Okay, i didn't know the meaning of "resync" in this context. Now i read
some documentation and i have understand taht resync is not equal to
recovey. Sorry. :-)
>
>> Now, when i start the raid with "raidstart" now there is not a resync.
>> This is okay.
>>
>
> You should revise your notions of true and false! You seem to have them
> backwards here :-).
>
I think that if you use the superblock the kernel see that the disks are
the same as before so no resync is needed since the kernel trust that
you syncronized disk the first time you created the raid. :-)
>
>> But, if one of the disk goes offline, there is a total "recovery"
>> instead the recovery of the last changed blocks.
>>
>
> Check what the kernel says! What makes you THINK there is?
Look at this:
1) I created a raid 1 *without* superblock, then setfaulty then hotadded
one device. The recovery taken about 4 minutes for 40 gb.
2) I created a raid 1 *with* superblock, and apart of this, the
conditions was the same as the point 1. The recovery says that it will
take about 22 days.
However in all of these 2 cases, the dmesg do not show anything like
what you mentioned in the readme.
However, I thought that the readme was obsolete because in the case 1)
the recovery taken 4 minutes.
Here is the dmesg for the case 2: (but the dmesg of the case 1 is identical)
raid1: device ndb operational as mirror 1
raid1: device nda operational as mirror 0
raid1: raid set md0 active with 2 out of 2 mirrors
md: updating md0 RAID superblock on device
md: ndb [events: 00000001]<6>(write) ndb's sb offset: 39118144
md: nda [events: 00000001]<6>(write) nda's sb offset: 39086016
raid1: bitmap c3c81b20 already active!
raid1: Disk failure on nda, disabling device.
Operation continuing on 1 devices
md: recovery thread got woken up ...
md: updating md0 RAID superblock on device
md: ndb [events: 00000002]<6>(write) ndb's sb offset: 39118144
md: (skipping faulty nda )
md0: no spare disk to reconstruct array! -- continuing in degraded mode
md: recovery thread finished ...
md: trying to remove nda from md0 ...
RAID1 conf printout:
--- wd:1 rd:2 nd:2
disk 0, s:0, o:0, n:0 rd:0 us:1 dev:nda
disk 1, s:0, o:1, n:1 rd:1 us:1 dev:ndb
disk 2, s:0, o:0, n:0 rd:0 us:0 dev:[dev 00:00]
RAID1 conf printout:
--- wd:1 rd:2 nd:1
disk 0, s:0, o:0, n:0 rd:0 us:0 dev:[dev 00:00]
disk 1, s:0, o:1, n:1 rd:1 us:1 dev:ndb
disk 2, s:0, o:0, n:0 rd:0 us:0 dev:[dev 00:00]
md0: unnotifying dev 2b00
ENBD enbd.c #5618[4]: enbd_ioctl received BLKMDUNTFY, now out of raid 900
md: unbind<nda,1>
md: export_rdev(nda)
md: updating md0 RAID superblock on device
md: ndb [events: 00000003]<6>(write) ndb's sb offset: 39118144
md: trying to hot-add nda to md0 ...
ENBD enbd.c #5828[18]: enbd_media_changed change nda requested
ENBD enbd.c #5846[18]: enbd_media_changed REMOTE CHECK ioctl received
err 0 reply to 0.
nda (read) [events: 00000001]
md0: new disk 2b00 too old for repair (disk 1 < bitmap 0)
md: bind<nda,2>
md0: notifying dev 2b00
ENBD enbd.c #5594[6]: enbd_ioctl received BLKMDNTFY, am now in raid 900
RAID1 conf printout:
--- wd:1 rd:2 nd:1
disk 0, s:0, o:0, n:0 rd:0 us:0 dev:[dev 00:00]
disk 1, s:0, o:1, n:1 rd:1 us:1 dev:ndb
disk 2, s:0, o:0, n:0 rd:0 us:0 dev:[dev 00:00]
RAID1 conf printout:
--- wd:1 rd:2 nd:2
disk 0, s:0, o:0, n:0 rd:0 us:0 dev:[dev 00:00]
disk 1, s:0, o:1, n:1 rd:1 us:1 dev:ndb
disk 2, s:1, o:0, n:2 rd:2 us:1 dev:nda
disk 3, s:0, o:0, n:0 rd:0 us:0 dev:[dev 00:00]
md0: set repair bit to 0 on superblock
md: updating md0 RAID superblock on device
md: nda [events: 00000004]<6>(write) nda's sb offset: 39086016
md: ndb [events: 00000004]<6>(write) ndb's sb offset: 39118144
md: recovery thread got woken up ...
md0: resyncing spare disk nda to replace failed disk
RAID1 conf printout:
--- wd:1 rd:2 nd:2
disk 0, s:0, o:0, n:0 rd:0 us:0 dev:[dev 00:00]
disk 1, s:0, o:1, n:1 rd:1 us:1 dev:ndb
disk 2, s:1, o:0, n:2 rd:2 us:1 dev:nda
disk 3, s:0, o:0, n:0 rd:0 us:0 dev:[dev 00:00]
RAID1 conf printout:
--- wd:1 rd:2 nd:2
disk 0, s:0, o:0, n:0 rd:0 us:0 dev:[dev 00:00]
disk 1, s:0, o:1, n:1 rd:1 us:1 dev:ndb
disk 2, s:1, o:1, n:2 rd:2 us:1 dev:nda
disk 3, s:0, o:0, n:0 rd:0 us:0 dev:[dev 00:00]
md: syncing RAID array md0
md: minimum _guaranteed_ reconstruction speed: 100 KB/sec/disc.
md: using maximum available idle IO bandwith (but not more than 100000
KB/sec) for reconstruction.
md: using 124k window, over a total of 39086016 blocks.
md0: removed bitmap c3c81b20
As you can see, nothing talk about intellignet recovery. But if the
driver isn't working, how is possible that the recovery taken 4 minutes
for the case 1) ?
> When you
> assert an opinion, you must back it with the evidence that leads you to
> that opinion, and the reasoning you use to arrive at the judgment from
> the observations, so that others may discuss it with you. Otherwise you
> are saying "the moon is blue" and others are saying "no it isn't", and
> nobody gets anywhere. If you expose your reasoning to scrutiny, you
> will also become aware yourself of flaws that may have led you to an
> erroneous belief (or possibly find your belief strengthened :).
>
> In this case you haven't presented evidence for the opinion you
> asserted above.
You are right. Sorry.
> The readme shows what the kernel should say:
>
> If you fault one device, then write to the device, then hotadd the
> faulted device back in, you should be able to see from the kernel
> messages (use "dmesg") that the resync is intelligent. Here's some
> dmesg output:
>
> raid1 resync starts on device 0 component 1 for 1024 blocks
> raid1 resynced dirty blocks 0-9
> raid1 resync skipped clean blocks 10-1023
> raid1 resync terminates with 0 errs on device 0 component 1
> raid1 hotadd component 7:1[1] to device 0
>
Nothing like that, but, I repeat, without the superblock the recovery of
40 gb array taken 4 minutes, versus a 22 days with an array with a
superblock. So i thought that the fr1 worked for the array without the
superblock. Probably i am wrong. :-)
>> But it seems to don't work:
>>
>
> What makes you think so?
>
Replied before. :-)
Thanks!!
Stefano.
More information about the ENBD
mailing list