[ENBD] RAID1 resync snapshots

Peter T. Breuer enbd@lists.community.tummy.com
Mon, 6 Jan 2003 19:35:30 +0100 (MET)


I'm rather proud of this sequence, so I'll post it while it's still on
my screen (I'm running the testme script that might well be in the
enbd-2.4.31.tgz archive).

1) device fairly quiescent. Has file system on it that has just been
   fscked.  Mounted. Mirror with two components, each with two enbd
   channels to their respective servers.

   [b] Groups:     2       (0)     (0)     (1)     (1)
   [b] Sockets:    4       (*)     (+)     (+)     (+)
   [b] Requested:  6.1420K (1.98K) (2.30K) (1013)  (917) 2.403KR/3.739KW max 34
   [b] Despatched: 6.1410K (1.98K) (2.30K) (1013)  (917) 2.403KR/3.738KW md5 981

2) set group #1 faulty ("temporary breakage") with

     echo 'setfaulty[b]=1' > /proc/nbdinfo

  and note the reulting "F" markup. One or two more writes came through
  meanwhile. The device is still mounted.

   [b] Groups:     2       (0)     (0)     (1F)    (1F)
   [b] Sockets:    4       (*)     (+)     (+)     (+)
   [b] Requested:  6.1430K (1.98K) (2.30K) (1014)  (917) 2.404KR/3.739KW max 34
   [b] Despatched: 6.1420K (1.98K) (2.30K) (1014)  (917) 2.403KR/3.739KW md5 982

3) write a file of 1MB in the filesystem (blocks here are 1K), and sync.
   The faulted group is unaffected, writes go to the surviving group.

   [b] Groups:     2       (0)     (0)     (1F)    (1F)
   [b] Sockets:    4       (+)     (*)     (+)     (+)
   [b] Requested:  7.1527K (2.18K) (2.50K) (1014)  (917) 2.404KR/4.748KW max 34
   [b] Despatched: 6.4791K (2.14K) (2.47K) (1014)  (917) 2.404KR/4.074KW md5 1.2

4) return the faulty group to normality with

     echo 'hotadd[b]=1' > /proc/nbdinfo

   The resync starts up at once. Reads are going to the first group of
   channels and writes to the second group. They're each only a block
   (1K).

   [b] Groups:     2       (0)     (0)     (1_)    (1_)
   [b] Resync:     [>....................] recovery = 0.0% (2/4096) finish=9s
   [b] Sockets:    4       (*)     (+)     (+)     (+)
   [b] Requested:  7.1566K (2.51K) (2.78K) (1015)  (918) 2.406KR/4.750KW max 34
   [b] Despatched: 7.1556K (2.51K) (2.78K) (1015)  (918) 2.405KR/4.750KW md5 1.9

and continues ...

   [b] Groups:     2       (0)     (0)     (1_)    (1_)
   [b] Resync:     [=>...................] recovery = 6.5% (270/4096) finish=8s
   [b] Sockets:    4       (*)     (+)     (+)     (+)
   [b] Requested:  7.4058K (2.58K) (2.83K) (1.06K) (967) 2.530KR/4.875KW max 34
   [b] Despatched: 7.4048K (2.58K) (2.83K) (1.06K) (966) 2.530KR/4.874KW md5 2.0

and continues ...

   [b] Groups:     2       (0)     (0)     (1_)    (1_)
   [b] Resync:     [==>..................] recovery = 10.3% (426/4096) finish=13s
   [b] Sockets:    4       (*)     (+)     (+)     (+)
   [b] Requested:  7.7107K (2.65K) (2.91K) (1.14K) (1.01K) 2.683KR/5.027KW max 3
   [b] Despatched: 7.7097K (2.65K) (2.91K) (1.14K) (1.01K) 2.683KR/5.026KW md5 2

and continues ...

   [b] Resync:     [=====>...............] recovery = 26.7% (1095/4096) finish=9s
   [b] Sockets:    4       (+)     (+)     (*)     (+)
   [b] Requested:  9.0156K (2.96K) (3.26K) (1.45K) (1.36K) 3.336KR/5.680KW max 3
   [b] Despatched: 9.0146K (2.96K) (3.26K) (1.45K) (1.36K) 3.335KR/5.680KW md5 2

and then finishes in a hurry, without much more ever having been read
or written:

   [b] Groups:     2       (0)     (0)     (1)     (1)
   [b] Sockets:    4       (+)     (+)     (*)     (+)
   [b] Requested:  9.1713K (3.00K) (3.29K) (1.49K) (1.40K) 3.413KR/5.758KW max 3
   [b] Despatched: 9.1713K (3.00K) (3.29K) (1.49K) (1.40K) 3.413KR/5.758KW md5 2

The dmesg output shows that all the dirty blocks werr concentrated at
the front of the disk, in blocks 1-3, 6, 146-1174.

   ENBD enbd.c #6618[2]: sf will setfaulty device 1 group 1
   ENBD enbd.c #1948[2]: bitmap_checkpage made page 0 for bitmap
   ENBD enbd.c #6553[2]: ha will hotadd device 1 group 1
   ENBD enbd.c #4262[6]: nbd_resync skipped clean blocks 0-0
   ENBD enbd.c #3900[2]: nbd_hotadd launched resync thread with pid 4517 for group 1
   ENBD enbd.c #4238[2]: nbd_resync saw marked blocks 1-3
   ENBD enbd.c #4262[7]: nbd_resync skipped clean blocks 4-5
   ENBD enbd.c #4238[3]: nbd_resync saw marked blocks 6-6
   ENBD enbd.c #4262[8]: nbd_resync skipped clean blocks 7-145
   ENBD enbd.c #4238[4]: nbd_resync saw marked blocks 146-1174
   ENBD enbd.c #4294[1]: nbd_resync skipped clean blocks 1175-4095
   ENBD enbd.c #1926[2]: bitmap_remove removed bitmap

So only those blocks got copied across. The device remained mounted
(and working) throughout.

Peter