[Linux-ha-dev] 2.0.8 bug in handling of clock wrap in longclock.c
David Lee
t.d.lee at durham.ac.uk
Mon Jun 18 07:54:42 MDT 2007
On Sat, 16 Jun 2007, Graham, Simon wrote:
> [...]
> After much struggling, I finally found the issue today - inside
> time_longclock(0 we log the fact that a wrap occurred; IF you are using
> the log daemon (as recommended), then in the guts of this, we end up
> calling time_longclock() recursively which goes through the same path
> and increments the wrapcount a second time (but does it silently because
> the recursive log is ignored) - gdb session attached which proves this.
>
> I think the fix is to update the static data in time_longclock() BEFORE
> logging - proposed patch attached. Assuming it's OK, what's the right
> way to get this applied to the sources?
Coming at this as a relative outsider but with a deep concern for code
quality and maintainability...
Is that really the best fix? Doesn't that continue to mask a deeper, core
problem? Won't this still leave in place a "bug waiting to happen"?
A logging routine such as "cl_log()" surely shouldn't change any other
data. Surely it shouldn't have any side-effects. Surely shouldn't it
conceptually be purely read-only on the data it uses (apart from the
direct filehandle-related stuff of course)? All the data it uses should
be regarded as being "const".
But it sounds as though "time_longclock()" DOES change data; that it has a
"write" side-effect. If so, then "cl_log()" should not be calling an
anticipated read-only function (such as "time_longclock()") in the first
place, should it?
There is ambiguity about "time_longclock()". The declaration in
"include/clplumbing/longclock.h" simply says:
Returns current time as a longclock_t.
That, to me, sounds "read-only". But clearly, it has "write" side-effects
of some sort.
The description and action of "time_longclock()" are thus mutually
contradictory.
Shouldn't THAT be the issue that we try to fix?
Glancing through the overall source code suggests that time_longclock()'s
many callers are generally expecting an innocent, read-only behaviour, not
something that has data-change side-effects.
Perhaps "time_longclock()" should become truly read-only. If some sort of
data-change is required by a subset of its callers, then perhaps a
parallel function (or a flag to the same function) should be designed, for
use specifically by that subset of its current callers which really do
need to effect some sort of change.
--
: David Lee I.T. Service :
: Senior Systems Programmer Computer Centre :
: UNIX Team Leader Durham University :
: South Road :
: http://www.dur.ac.uk/t.d.lee/ Durham DH1 3LE :
: Phone: +44 191 334 2752 U.K. :
More information about the Linux-HA-Dev
mailing list