[ENBD] raid1 mirroring inside enbd

Peter T. Breuer enbd@lists.community.tummy.com
Fri, 3 Jan 2003 14:32:14 +0100 (MET)


"chacron1 wrote:"
> "A long time ago Peter T. Breuer" wrote:
> > > Does the enbd driver update the bitmap on disk on every write
> >
> > There is no bitmap on disk at the moment. A "notuptodate blocks" bitmap
> > is kept in memory, and a resync thread will only update the blocks that
> > are marked in it. They're marked when a write to that component of
> > the device cannot be completed.
> >
> 
> Are there two memory bitmaps on both sides or only one on the side where the write()

There is only one bitmap, and it is client side. Well, there is one
bitmap per failed group of connections (i.e., per server).

> requests are locally write on disk and also sent through the network to the other side ?
> In the later case what is the behavior whenever the primary side failes ?


Usually both sides are remote. If one is local physically, then it
currently has to be accessed via localhost. I am working to make direct
access on localhost to a local resource possible. It should be
possible, because the servers and clients have abstraction layers that
allow clients and servers of different "kinds" to talk to each other
what knowing what they are, but last time I tried it didn't work.
I have had other things to look at first (concretely, it should
not matter to the client whether the "server_stub" it talks to is
the interface to a "netserver" or to a "fileserver" - you will find
all those source codes in separate files in the nbd/ subdir).


> > Performance is currently quite degraded anyway, as I am trying to get
> > it right before I get it fast! Each write request from the kernel has to
> > be treated twice, once for each destination, which means two copies to
> > use space, and two network transfers.
> 
> It seems you don't use BH to avoid data copy ?

No, that's not the problem. In fact the same BH data buffer is used
for both the writes. Well, the same request is used for both the
writes. But the first time end_io is run on it, it is simply repointed
to a different target. The second time, it is really reclaimed. Replace
"second" by the appropriate number in reality. The request carries
a small bitmap of where it's been.

> Do you mean one data send and one acknowledgement message ?

Yes, each time the (same) request is  handled by a daemon, it results
in

   1 copy to userspace
   1 network transmission
   1 network ack
   1 ack to kernel

The repeated copy to userspace can be avoided if I am careful. I will
make the buffer area common to all client daemons (handed down from the
session master). The second dameon can be handed a pointer to the data
already in the shared userspace buffer, instead of waiting for a copy
in its own area. Umm .. but this implies that the original daemon that
handled the request must be blocked until the second daemon has sent
out te data. On second thoughts, maybe that's not such a good idea.
It's probably only solvable by interposing a stacking buffer. 

Designs please ...

> > It is becoming more usable, because I am using it myself in some test
> > setups. I have only just got it to set itself up to more than just
> > localhost, and now I will have to make the bitmap more sparing of
> > kernel memory by using a two or three level map. Otherwise terabyte
> > sized devices would use up something 30MB of kernel memory as bitmaps!
> > Well, twice that, one for each component. Only a page at a time needs
> > allocating, really.
> 
> If 3 level bitmaps then you will maybe need a lock to protect access on it
> or not depending how many threads may have concurrent on it ?

There is a lock. I implemented it (two levels) and it works fine.
Each page of the bitmap is only allocated on need, as it is
attempted to be marked. If the page cannot be allocated, then a
"1" is left in place of the pointer to it, and accesses assume that 
all that page is marked, which is the correct overly-conservative
semantics.

I might move the lock down into the bitmap struct, because there can be
more than one bitmap. At the moment the lock is on all bitmap ops.

Peter