[Linux-ha-dev] [PATCH 08 of 10] cl_log: Restore old logfile open/seek/write/close behaviour
bernd.schubert at fastmail.fm
Mon Nov 15 11:25:07 MST 2010
On Monday, November 15, 2010, Lars Ellenberg wrote:
> On Mon, Nov 15, 2010 at 03:52:05PM +0100, Bernd Schubert wrote:
> > On Friday, November 12, 2010, Lars Ellenberg wrote:
> > > On Fri, Nov 12, 2010 at 05:04:43PM +0100, bs_lists at aakef.fastmail.fm
> > > > # HG changeset patch
> > > > # User Bernd Schubert <bernd.schubert at fastmail.fm>
> > > > # Date 1289577717 -3600
> > > > # Node ID 750c5a016d8135a7170d4a2fbe0de0f98478c572
> > > > # Parent 0324966049c1e90b9d18c9c63041d2c73964d42f
> > > > cl_log: Restore old logfile open/seek/write/close behaviour.
> > > >
> > > > This patch actually does not completely revert commit
> > > > 2435:ada347da564d, but adds a layer to support both,
> > > > open/write/close and and
> > > > open-once, write, close/open-for signal handlers
> > > >
> > > > It also changes a marco into a static function. And also uses
> > > > system IO (open/close/write) instead of libc IO
> > > > (fopen/fclose/fwrite). Libc IO has a buffer, which is not suitable
> > > > for log files (in case of a stonith, all the buffer and which might
> > > > large, will be missing in log files.
> > >
> > > You are aware of this thread?
> > > http://firstname.lastname@example.org/msg05590.ht
> > > ml
> > No, sorry I did not notice that one.
> > > Basically I suggest to fflush every time the inner message logging loop
> > > is done, and add an fsync for pri ERR and worse.
> > > we can make that warning and worse, or even make it configurable.
> > >
> > > It should perform better than the always unbuffered approach, while
> > > keeping similar guarantees about not losing (important) messages.
> > I think it is a good idea to flush the kernel buffer, but do we really
> > want to have a libc buffer? There is no way to flush the libc buffer
> > from kernel space, e.g. with sysrq. The problem is that a remote system
> > might stonith that system for whatever reason and it would be good to
> > have the latest logs then.
> My approach was:
> fflush makes glibc buffer flush to kernel page cache.
> fflush is done whenever the current message backlog was processed.
> it will "do itself" if the message backlog is so huge that the
> glibc buffers (usually in the order of 8k, iirc) fill up anyways.
> We could of course add in an other explicit fflush every so often,
> or for every ERROR or worse (or warning or worse).
> It will usually still perform better, as we do the write() syscall only
> for full glibc buffers.
> fsync will (ok: should) make kernel page cache go to stable storage.
> fsync is done whenever the current message backlog was processed,
> if at least one ERROR or worse was counted.
> Again we can (and probably should) add an extra fsync every X messages
> in case the message backlog is huge, but only for severe messages.
Yes, I understood everything so far. But lets say something after the last
message (with an uncritical priority) made all other daemons to die? There is
simply nothing left, which would make logd to flush the last message. I admit
that that has a very low chance and is a corner case, but shouldn't we try to
make it work for corner cases as well? So that leaves the question if we
shouldn't better use a timer for flushes?
> > I noticed that even IPMI resets nowadays use acpi to trigger a reboot, I
> > have not checked the kernel code yet, but in principal that allows the
> > kernel to do an emergy flush of its buffers, but no way to do that for
> > libc...
> > > Also, if you want to comment on the issues raised in
> > > http://email@example.com/msg05598.ht
> > > ml about log rotation and files potentially being in use anyways,
> > > still?
> > Anything not yet answered by those patches here?
> As long as we still do the open(); write(); close(); unless the library
> client explicitly enables SIGHUP handling, and the log rotate does not
> immediately gzip the stuff, we are good.
Yes, that is what patch 08/10 does. We have no influence on the behaviour of
logrotate, though ;)
> If you think it buys performance, we could weaken it still to
> (pseudo code)
> if (tmp != persistent)
Good idea and should be rather fast for local filesystems and should even help
for NFS with an attribute cache. Just the pointer will not help, will it?
st_ino would be optimal, if there wouldn't be the risk of recycled inodes...
And I'm complaining at least 1000th time that Linux does not provide st_gen
(inode generation number yet). So the only reliable information left is
st_size and we would need to do our own accounting.
> > Hmm, well, we need to add the the kill action to the logrotate scripts,
> > but those scripts are in the heartbeat package.
> > Maybe not that optimal that heartbeat (without enabled logd) and
> > ha-logd use the very same log files by default.
> Should we change the internal defaults of logd,
> or just provide an "example logd.cf" with different settings?
Well, once the last patch of the series got applied, we want logroate to send
the sighup, but we don't have control over that from the cluster-glue package
(yet). Now changing default filenames would work out, but might upset users.
Not doing that also will upset users. So the only choice without upsetting
users would be to change it in the hearbeat package and to create a dependency
on the heartbeat version.
More information about the Linux-HA-Dev