[ENBD] raid1 mirroring inside enbd

chacron1 enbd@lists.community.tummy.com
Fri, 03 Jan 2003 11:42:30 +0100


"Peter T. Breuer" wrote:

> "A month of sundays ago chacron1 wrote:"
> > >I've done an implementation of raid1 mirroring inside enbd.  In
> > >principle it uses a bitmap of dirty blocks in order to reduce resync
> > >time.
> >
> > If i undertstand well this implementation enables mirroring of data block
>
> BTW, if you want to give it a try, enbd-2.4.31 is quite usable right
> now, and available from the ftp site
>
>   ftp://oboe.it.uc3m.es/pub/Programs/
>

I'll try to use it on a 2.14.18 . Is it easy to use ? Do i have to use specific
partition type on disk ... ?



>
> The makefile is setup to start a silly internal 2x2 raid mirror connect on
> localhost.  Here's a snapshot from it taken during a resync.
>
>   Device a:       Closed
>   Device b:       Open
>   [b] State:      verify, signed, rw, merge requests, enabled, validated, plug, last error 0, lives 0, bp 0
>   [b] Queued:     +0R/0W curr (check 0R/0W) +1R/31W max
>   [b] Buffersize: 262144  (sectors=512, blocks=256)
>   [b] Blocksize:  1024    (log=10)
>   [b] Size:       4096KB
>   [b] Blocks:     4096
>   [b] Groups:     2       (0)     (0)     (1F)    (1F)
>   [b] Resync:     ==========> (19%)
>   [b] Sockets:    4       (+)     (+)     (*)     (+)
>   [b] Requested:  11.775K (3.51K) (3.50K) (4.79K) (3.97K) 3.011KR/8.763KW max 34
>   [b] Despatched: 11.774K (3.51K) (3.50K) (4.79K) (3.97K) 3.011KR/8.762KW md5 3.08KW (3.08K eq, 0 ne, 0 dn)
>   [b] Errored:    0       (0)     (0)     (0)     (0)     0+0
>   [b] Pending:    1       (0)     (0)     (1)     (0)     0R/1W+0R/0W
>   [b] B/s now:    17.0K   (0R+17.0KW)
>   [b] B/s ave:    143K    (36.0KR+106KW)
>   [b] B/s max:    8.91M   (2.97MR+5.94MW)
>   [b] Spectrum:   98%1    1%34
>   [b] Kthreads:   0       (0 waiting/0 running/1 max)
>   [b] Cthreads:   2       (+)     (+)     (-)     (-)
>   [b] Cpids:      2       (2575)  (2576)  (2577)  (2578)
>   Device c-p:     Closed
>
> It is connected to two servers (actually the same localhost server in
> the test, but never mind), and thus reports two "groups" of channels.
> The second group (#1 !) has been marked faulty, removed, and readded,
> and now the resync is going through.
>
> To set up an internal mirror like this, you simply have to give a
> second server on the client command line, prefixed with a "+", like so:
>
>   enbd-client piano:1044 -n 2 +xilophone:2022 -n 2  /dev/ndb
>
> (the -n 2 each time states  two channels in that "group" of channels).
>
> > > It's available in ftp://oboe.it.uc3m.es/pub/Programs/nbd-2.4.31.tgz .
>
> I haven't tested anything but the most basic things, such as the resync
> shown above. I've decided to make the whole device resync if you do a
> setfaulty, and to use the bitmap of notuptodate blocks if you do the
> hotremove without the setfaulty.

Then , upon node failure enbd does a full resync and upon disk replacement
a partial one .


> Thus the sequence of commands would
> normally be
>
>       1) optional, forces whole device to be resynced later
>
>           echo 'setfaulty[b]=1' > /proc/nbdinfo
>
>         This affects group #1 of device b (i.e. ndb) and marks it
>         as completely notuptodate, but still active (i.e. able
>         to take further accesses). Any further writes may or may not
>         fail but they will be dealt with accordingly. I hope.
>
>       2) required (when managing raid by hand, as I'm talking about)
>
>            echo 'hotremove[b]=1' > /proc/nbdinfo
>
>         This takes group #1 of ndb out of the raid array. It won't be
>         written to. Writes to it will be marked on the notuptodate
>         bitmap instead.
>
>       3) required
>
>            echo 'hotadd[b]=1' > /proc/nbdinfo
>
>         This makes the device available for writes and starts the resync
>         of notuptodate blocks in a separate kernel thread.
>
> In addition, if the device drops out via a network fault, it will
> be treated as though (2) above had occurred. When the network comes
> back (3) will occur. Minor dropouts, such as losing a single channel
> temporarily, are tolerated, as always.

What is your load balancing policy when several channel are available ?


>
>
> Recall that I haven't tested all things in all states. I would
> like to encourage testers to mention it when something doesn't work
> like they expect, or just plain doesn't work. I am sitting on a
> test rig myself, so the pilot is on board the same aeroplane.

Thanks,
Eric