[ENBD] Ammended setup: -> what would happen if...
Peter T. Breuer
enbd@lists.community.tummy.com
Mon, 21 Apr 2003 13:51:25 +0200 (MET DST)
"A month of sundays ago Liam Helmer wrote:"
> Peter T. Breuer wrote:
> >I thought I explained in my previous message. There is no "local to
> >remote" mirroring in enbd, but 2.4.31 does have a raid1-like feature
> >that amounts to clientside raid1 amongst enbd connections. That support
> >has moved outside of the enbd driver and should now be done by
> >ordinary raid1 over enbd, and indeed must be done by ordinary raid1
> >if you want a local miror. If you use the fr1 driver (freshmeat)
> >then disconnect and reconnect is automatic, and it will do an
> >intelligent (minimal) resync too.
> >
> But how intelligent? (this was the original question)
It recognizes network failures and network reconnects by the enbd
device (you'll have to load the enbd module with show_errs=1, but come
to think of it, I'll make that be set automatically per enbd device when
it detects it's in a raid array). When the enbd device comes back up
again (as checked by UUID and event count in the enbd devices
superblock), then only the blocks changed since it last went away are
resynced by the raid device.
> The server doesn't necessarily care who the client is. So, the nbd
> server has a client disconnect. Then another client connects. The disk
Nobody's ever tried that.
> is changed. That client disconnects.
> What if the first and second clients are not the same? Now, the disk has
> changed, and the first nbd client reconnects. It has a local (or remote)
Well, that's not something that would ever be allowed by an
administrator. It's changing the device underneath without going
through the driver (which is in the enbd /client/, not the server).
> mirror that it's been using in the meantime, and it wants to resynch.
> Will the client:
>
> a) resynch the changed blocks regardless of whether the server has
> changed or not
The client will only know about the blocks that need to be written
from its kernel to the server. Whatever else has happened on the server
in the meantime is not in its sphere of consciousness.
It's possible that something could be done - such as recording the blocks
that have been written through the server daemon and playing them back
through the client later. I have the technology to do that. Ask me to
do it and I will ...
> b) detect the changes between the server's state now versus when it
> disconnected, and perform a full resynch, using the part of the mirror
> that it's maintained connection to (local) as the master
That's an interesting question because now it's in the province of the
raid1 (or fr1) drivers, not enbd. Yes, the raid1 and enbd drivers will
likely detect the problem and therefore force a full resync from the
client through enbd to the server - but only if you have altered the
serverside resource in a way that can be detected by raid.
Essentially that means mounting the resource some other place, and/or
including it in some other raid device. The latter is sufficient. I'm
not sure if the former is. At any rate, if you try including the server
resource in some other raid device, it will protest loudly, and you
will have to force it, which will in turn be detected at the
reinclusion of the device in its home array, triggering a full resync.
Notice that even if the array detects the changed mirror component, it's
a question of preference which component of the array you now choose to
regard as faulty! And so you should force the direction of the resync to
suit yourself. That can mean playing with the raid tools.
> c) detect the changes between the server's state now versus when it
> disconnected, and perform a full resynch, using the server that it just
> reconnected to as the master
This is raids business. If you change a mirror component in an
underhand way, you are essentially introducing corruption. Now it is up
to you to fix it.
> d) give an error and ask for user intervention
Anything that caused the superblock to change UUID would cause that.
> I don't see any other real choices.
I like the idea of getting the server to know what's changed on the
resource between connects from different clients, but I don't see the
point in it! A server can't force updates on its clients - they have to
ask for whatever they want. It sounds to me as though you want to
reverse the roles, and copy from the server to the client when the
client next reconnects after a time away. Well, you can do that. But
the simplest idea was to "don't do that". It sounds like you want
some overlying mechanism for sharing changes bwtrween different
clients. Nobody can do that (well, they can but it wouldn't help)
without an overlying shared distributed filesystem such as gfs or coda
or ... it's helpful but somewhat incidental that you promise not to
connect two clients at the same time. The problem is still of sharing
changes made between two different clients. Timing is not the issue
(well, it IS an issue, and thinking about it should help you see a
problem there too!).
Even if two clients were updated successfully at the block level so
that they shared the same information on "disk" (enbd), their file
system caches would still be out of sync, leading to immediate file
system corruption.
> >>-> this is no longer an enbd question. Which should make answers more
> >>clear. I'd probably have to do a userspace synch: you're right, that I
> >>
> >>
> >Userspace sync should not be necessary - that is what raid1 does.
> >
> But raid-1 isn't going to be adequate for doing the types of file synch
> that are necessary for this application, where there's multiple clients,
> etc. What I need is either a full-time file-system level mirror, or a
You need a distributed shared file system if you have different clients
writing to the same resource.
> caching filesystem. Or a userspace process that is run on reconnect to
> resynch the files and ask the user what do do about conflicts.
Whatever, it's a file-level problem.
You would have precisely the same problem if several computers
connected (one at a time, of course :-) to the same IDE disk. This
is why I urged you to think of an nbd just as an ordinary disk.
You can do that with scsi.
> >>should be just looking at this as a hard disc. What I was most
> >>interested in is what behaviour the raid would exhibit... From this
> >>case, what I think is that the raid is going to take the most recently
> >>
> >>
> >I'm not sure what you mean. A raid1 mirror of two components, in which
> >one has failed, continues using the other component. When the failed
> >component is replaced, then the replacement is synced in the background
> >from the good component. In fr1 (as opposed to raid1) that resync is
> >intelligent, only consisting of blocks that have been updated locally
> >but not remotely. In fr1 the replacement is automatic too. In no
> >scenario do you do any resync manually.
> >
> So, that would point to option b, or option a above.
The automatic resync is in the wrong direction for you. You will have
to force it in the opposite direction, and it still won't update your
filesystem cache. You will have to have the raid device dismounted when
you do this.
> >Raid is always resynced by the raid driver. That's much of the the point!
> >
> Yes, the driver does the resynch, but it waits for the superblock to be
> written onto the failed disk before it will use it, thus requiring
I don't understand what you mean. It "uses" the disk even as it is
resyncing it, and terminates by writing the superblock when it is now
up to date. I don't see how that fits with your statement above.
> manual intervention. You can't take out the failed disk, put the new one
> in, and have it go -> You have to tell the kernel to use it.
No. That's not true at all. You can exactly do that - at least the fr1
driver detects an old disk and brings it up to date. OTOH, if you put
in a random disk you don't want the kernel to start writing to it! How
is it supposed to know what you are putting it in for?
> >I'm not sure what you mean... you can either mirror at the filesystem
> >level or at the block level. There are advantages and disadvantages to
> >both. Omirr is the standard fs mirroring solution, I think.
> >
> I looked up this one, 'cause I hadn't heard of it before. It looks like
> it might have been the functionality that I'm looking for... however, it
> looks like it got dropped from the kernel (kernel traffic quote from 1999):
>
> /According to Matthew Kirkwood, with omirr * "two filesystems could be
> kept in sync with one being the canonical version and the other
> (possibly remote, or slow) as a backup or similar." * He added, * "It
> went into the kernel at about 2.1.3x and out again at about 5x, where it
> became obvious that it was causing too many complications and was better
> left to userspace."*/
But it is a userspace solution, I believe.
> There's 3 jobs that I need: failover, backup, and file-level
> synchronization. 2 out of 3 aint bad, but I'm holding out for a way to
> get the third.
It's not clear WHY you want what you want. There are standard
solutions, for example, for maintaining a realtime mirror on another
machine, via enbd, to which you failover immediately that your primary
goes down. When the primary comes up again, it is detected and resynced
from the secondary, automatically. That is what the heartbeat scripts
do. That's failover.
Peter