[Linux-ha-dev] Hi list!!
Stephen C. Tweedie
sct@redhat.com
Thu, 21 Oct 1999 12:08:46 +0100 (BST)
Hi,
On Wed, 20 Oct 1999 11:00:24 -0600, Sean Reifschneider
<jafo@tummy.com> said:
> On Wed, Oct 20, 1999 at 05:46:10PM +0100, Stephen C. Tweedie wrote:
>> Yes --- or rather, journaling is one method by which we can make a
>> filesystem transactional.
> As I understand it, the journaling in ext3 (which is a fantastic
> design, BTW), only handles journaling of file-system meta-data,
> not of actual file data.
Actually, it journals both for now. That simplifies the journaling
core quite a bit, and I'm keeping the metadata-only journaling
disabled until I'm sure that the core journal code is rock-solid.
Full journaling will still be an option in the long term, though. In
particular, for systems such as NFS servers, fast commit of written
data to disk is necessary for low latency, and by allowing data to be
journaled to a separate journal device, we can improve NFS write
latencies enormously while retaining synchronous writes.
> This is a common misconception I've run into is that it handles the
> file data as well.
It does.
> The problem is that in the event of a crash, there's little a file-system
> can do to prevent the crashed applications data from being left in an
> unknown state.
Absolutely. Data journaling is not application journaling. You can
ensure that the data is intact on disk, but that doesn't mean that it
represents a consistent state for the application. Only the
application can ever have a hope of understanding what its own
transaction semantics are.
What filesystem data journaling _can_ offer in this situation is fast
commit of filesystem transactions to disk. This is especially true if
you have a separate journal disk, but even without that, committing
data to disk (ie. O_SYNC writes or fsync()) becomes a simple matter of
writing a single sequential record to the journal. The filesystem can
propagate the in-place updates to the main on-disk structures later on
at leisure, but the initial synchronous data write can be done much
more quickly with filesystem data journaling.
--Stephen