[Linux-ha-dev] 2.0.8 bug in handling of clock wrap in longclock.c

David Lee t.d.lee at durham.ac.uk
Mon Jun 18 07:54:42 MDT 2007


On Sat, 16 Jun 2007, Graham, Simon wrote:

> [...]
> After much struggling, I finally found the issue today - inside
> time_longclock(0 we log the fact that a wrap occurred; IF you are using
> the log daemon (as recommended), then in the guts of this, we end up
> calling time_longclock() recursively which goes through the same path
> and increments the wrapcount a second time (but does it silently because
> the recursive log is ignored) - gdb session attached which proves this.
>
> I think the fix is to update the static data in time_longclock() BEFORE
> logging - proposed patch attached. Assuming it's OK, what's the right
> way to get this applied to the sources?

Coming at this as a relative outsider but with a deep concern for code
quality and maintainability...

Is that really the best fix?  Doesn't that continue to mask a deeper, core
problem?  Won't this still leave in place a "bug waiting to happen"?

A logging routine such as "cl_log()" surely shouldn't change any other
data.  Surely it shouldn't have any side-effects.  Surely shouldn't it
conceptually be purely read-only on the data it uses (apart from the
direct filehandle-related stuff of course)?  All the data it uses should
be regarded as being "const".

But it sounds as though "time_longclock()" DOES change data; that it has a
"write" side-effect.  If so, then "cl_log()" should not be calling an
anticipated read-only function (such as "time_longclock()") in the first
place, should it?

There is ambiguity about "time_longclock()".  The declaration in
"include/clplumbing/longclock.h" simply says:
       Returns current time as a longclock_t.

That, to me, sounds "read-only".  But clearly, it has "write" side-effects
of some sort.

The description and action of "time_longclock()" are thus mutually
contradictory.

Shouldn't THAT be the issue that we try to fix?

Glancing through the overall source code suggests that time_longclock()'s
many callers are generally expecting an innocent, read-only behaviour, not
something that has data-change side-effects.

Perhaps "time_longclock()" should become truly read-only.  If some sort of
data-change is required by a subset of its callers, then perhaps a
parallel function (or a flag to the same function) should be designed, for
use specifically by that subset of its current callers which really do
need to effect some sort of change.

-- 

:  David Lee                                I.T. Service          :
:  Senior Systems Programmer                Computer Centre       :
:  UNIX Team Leader                         Durham University     :
:                                           South Road            :
:  http://www.dur.ac.uk/t.d.lee/            Durham DH1 3LE        :
:  Phone: +44 191 334 2752                  U.K.                  :


More information about the Linux-HA-Dev mailing list