[ENBD] Re: 2 sided enbd
Peter T. Breuer
enbd@lists.community.tummy.com
Fri, 24 Jan 2003 00:27:56 +0100 (MET)
I'm sorry .. I missed this question -
"A month of sundays ago Chas Wareing wrote:"
> Greetings Peter - I'm getting ready to dig into ENBD; however, I have
> one question - Can I write to both ends of the mirror, i.e. server1
> write mirrors to server2; and server 2 write mirrors to server1; same
> device?
Errmm .. I don't follow. Let me answer something that you MAY be
asking ... You can't write to server and client at the same
time without fouling up.
That's because the client kernel will cache the responses it receives.
If you write to the resource on the server, the client will never dream
of asking about it.
OK, so you could make all caching on the client vanish, either by using
direct i/o (this really has to wait for kernel 2.6 to work, I seem to
recall) or by talking to a raw chacter device lined to the nbd device.
But you can't put a FS on a raw character device. Well, you can, but
you can't mount it (you can access it fine via an application that
understands the FS).
What's more, if you were to mount the direct i/o'd nbd-device, the
kernel would be caching certain metadata (dcache, free blocks
bitmap ..) about the FS and I'm pretty sure that writing to the server
won't affect those!
So no.
But what you can do is mount a pair of servers and clients in a ring
configuration. This may take some mods (I forget).
The idea is that you raid mirror nbd and a local resource together.
The server serves the md compound device as resource. Symmetric
both ends.
So what happens if you write to a md compound device? The write
is mirrored to the local device and to the nbd client. The client sends
it to the server at the other end. That writes it to the md compound
device over there, which writes it to the local resource, and
to the nbd clientthere, which sends it back over the net to US.
And our server writes itto the md compound device over here.
OK, so you'd get ringing. Or you would if the kernel weren't in
the way. Writing twice to the same device is likely to hit
buffer, not hit "disk" both times. The buffer gets overwritten
and only one write will (eventually) result when the buffer ages.
But this buffer now takes a long time to write to disk .. it's
constantly refreshed by being rewritten by the circulating
information. Or it would be, if nbd didn't replace writes with
reads by going into md5sum mode. It'll see that it's trying to write
the same information as is already there, and suppress the second
write.
Now, that's a pretty roundabout way of doing it. To write locally you
actually send a packet round the net. And then you set up a ringing
loop, and suppress the second packet by a calculated checksum.
What this really needs is a cache in nbd to suppress ringing packets.
But it'll suppress anyway, if in md5sum mode, and kept there - it's
just silly.
Peter