[ENBD] RAID1 resync snapshots
Peter T. Breuer
enbd@lists.community.tummy.com
Mon, 6 Jan 2003 19:35:30 +0100 (MET)
I'm rather proud of this sequence, so I'll post it while it's still on
my screen (I'm running the testme script that might well be in the
enbd-2.4.31.tgz archive).
1) device fairly quiescent. Has file system on it that has just been
fscked. Mounted. Mirror with two components, each with two enbd
channels to their respective servers.
[b] Groups: 2 (0) (0) (1) (1)
[b] Sockets: 4 (*) (+) (+) (+)
[b] Requested: 6.1420K (1.98K) (2.30K) (1013) (917) 2.403KR/3.739KW max 34
[b] Despatched: 6.1410K (1.98K) (2.30K) (1013) (917) 2.403KR/3.738KW md5 981
2) set group #1 faulty ("temporary breakage") with
echo 'setfaulty[b]=1' > /proc/nbdinfo
and note the reulting "F" markup. One or two more writes came through
meanwhile. The device is still mounted.
[b] Groups: 2 (0) (0) (1F) (1F)
[b] Sockets: 4 (*) (+) (+) (+)
[b] Requested: 6.1430K (1.98K) (2.30K) (1014) (917) 2.404KR/3.739KW max 34
[b] Despatched: 6.1420K (1.98K) (2.30K) (1014) (917) 2.403KR/3.739KW md5 982
3) write a file of 1MB in the filesystem (blocks here are 1K), and sync.
The faulted group is unaffected, writes go to the surviving group.
[b] Groups: 2 (0) (0) (1F) (1F)
[b] Sockets: 4 (+) (*) (+) (+)
[b] Requested: 7.1527K (2.18K) (2.50K) (1014) (917) 2.404KR/4.748KW max 34
[b] Despatched: 6.4791K (2.14K) (2.47K) (1014) (917) 2.404KR/4.074KW md5 1.2
4) return the faulty group to normality with
echo 'hotadd[b]=1' > /proc/nbdinfo
The resync starts up at once. Reads are going to the first group of
channels and writes to the second group. They're each only a block
(1K).
[b] Groups: 2 (0) (0) (1_) (1_)
[b] Resync: [>....................] recovery = 0.0% (2/4096) finish=9s
[b] Sockets: 4 (*) (+) (+) (+)
[b] Requested: 7.1566K (2.51K) (2.78K) (1015) (918) 2.406KR/4.750KW max 34
[b] Despatched: 7.1556K (2.51K) (2.78K) (1015) (918) 2.405KR/4.750KW md5 1.9
and continues ...
[b] Groups: 2 (0) (0) (1_) (1_)
[b] Resync: [=>...................] recovery = 6.5% (270/4096) finish=8s
[b] Sockets: 4 (*) (+) (+) (+)
[b] Requested: 7.4058K (2.58K) (2.83K) (1.06K) (967) 2.530KR/4.875KW max 34
[b] Despatched: 7.4048K (2.58K) (2.83K) (1.06K) (966) 2.530KR/4.874KW md5 2.0
and continues ...
[b] Groups: 2 (0) (0) (1_) (1_)
[b] Resync: [==>..................] recovery = 10.3% (426/4096) finish=13s
[b] Sockets: 4 (*) (+) (+) (+)
[b] Requested: 7.7107K (2.65K) (2.91K) (1.14K) (1.01K) 2.683KR/5.027KW max 3
[b] Despatched: 7.7097K (2.65K) (2.91K) (1.14K) (1.01K) 2.683KR/5.026KW md5 2
and continues ...
[b] Resync: [=====>...............] recovery = 26.7% (1095/4096) finish=9s
[b] Sockets: 4 (+) (+) (*) (+)
[b] Requested: 9.0156K (2.96K) (3.26K) (1.45K) (1.36K) 3.336KR/5.680KW max 3
[b] Despatched: 9.0146K (2.96K) (3.26K) (1.45K) (1.36K) 3.335KR/5.680KW md5 2
and then finishes in a hurry, without much more ever having been read
or written:
[b] Groups: 2 (0) (0) (1) (1)
[b] Sockets: 4 (+) (+) (*) (+)
[b] Requested: 9.1713K (3.00K) (3.29K) (1.49K) (1.40K) 3.413KR/5.758KW max 3
[b] Despatched: 9.1713K (3.00K) (3.29K) (1.49K) (1.40K) 3.413KR/5.758KW md5 2
The dmesg output shows that all the dirty blocks werr concentrated at
the front of the disk, in blocks 1-3, 6, 146-1174.
ENBD enbd.c #6618[2]: sf will setfaulty device 1 group 1
ENBD enbd.c #1948[2]: bitmap_checkpage made page 0 for bitmap
ENBD enbd.c #6553[2]: ha will hotadd device 1 group 1
ENBD enbd.c #4262[6]: nbd_resync skipped clean blocks 0-0
ENBD enbd.c #3900[2]: nbd_hotadd launched resync thread with pid 4517 for group 1
ENBD enbd.c #4238[2]: nbd_resync saw marked blocks 1-3
ENBD enbd.c #4262[7]: nbd_resync skipped clean blocks 4-5
ENBD enbd.c #4238[3]: nbd_resync saw marked blocks 6-6
ENBD enbd.c #4262[8]: nbd_resync skipped clean blocks 7-145
ENBD enbd.c #4238[4]: nbd_resync saw marked blocks 146-1174
ENBD enbd.c #4294[1]: nbd_resync skipped clean blocks 1175-4095
ENBD enbd.c #1926[2]: bitmap_remove removed bitmap
So only those blocks got copied across. The device remained mounted
(and working) throughout.
Peter