[Linux-ha-dev] make --enable-libc-alloc the default?

Lars Marowsky-Bree lmb at suse.de
Thu Feb 22 17:02:05 MST 2007


On 2007-02-22T16:05:51, Alan Robertson <alanr at unix.sh> wrote:

> > Has it caught bugs recently?
> Andrew's been writing most of the newer code.  Newer code has more bugs
> than older code.  Andrew has it disabled.  What a surprise that it isn't
> finding any bugs in his code.

That comment is not appropriate. Andrew has only been testing with it
disabled for a few weeks, because disabling it greatly increased the
value of valgrind for him. Coverity, too, has better models of
malloc/free, and promptly spotted a bunch of more issues.

And it turned out to have a performance benefits. As well as using
memory more efficiently.

So, it seems to be a win to have tested with it disabled. And I've NEVER
had a report where it caught any issue in the field. Have you? Any case
where it _would_ have helped, but was disabled?

> They are intimately tied together - and probably the cause of the
> inefficiency you're complaining about.

The bucket allocator is not tied to the safe guards. It is tied to it in
the entangled current implementation, but that could be restructured
readily using inline functions calling either the bucket allocator or
libc free.

Yes, MARK_PRISTINE verification on alloc doesn't really work too well.
But, strangely enough, that's disabled because of the horrible
performance penality anyway ...

For production systems, I do not see the value of this allocator. I do
not even see the value of the checks, because they simply don't catch
anything in practice; so I conclude they are nice for debugging, but for
production systems, I'd rather use the much better tested libc
allocators.

> The relevant acronym is RAS:
> 	Reliability - aided by using this during debugging
> 	Availability - (improving R improves A)
> 	Servicability - the ability to debug things in the field
> 
> So, the patches help all three letters of RAS - for a small performance
> penalty.

This is your reading. I disagree. They do not help RAS in a way which
offsets the runtime penalty and the code complexity.

The coredump feature is great. That helps a lot.

> If Linux-HA consumed lots of CPU, and the first paragraph of the web
> site said "enhance performance" or something similar, I'd certainly
> agree.  This isn't about it being my code.  It's about something much
> simpler - the reason the project exists.
> 
> It's about the right perspective for the task at hand.

Yes. The right perspective, of course, being subjective. And, you having
started the project, of course get to comment on why it exists.

If this was the right perspective, for only a few additional percent of
runtime overhead, one might consider linking against a debugging version
of glibc, or a debugging kernel. Strangely enough, people don't do that
for production systems.

If reliability and code quality were your utmost concerns, you'd have a
much stronger point. Your personal bugzilla history doesn't reflect
that, nor does the amount of review you do before accepting a patch or
while proof reading other people's code. So, I can but conclude you're
arguing in this case because it's your code, and probably I bother
arguing with you for the same reason instead of just doing my thing on
SLES ;-)


However, design arguments are _impossible_ to win by rational argument,
otherwise there'd be only one kind of art. So I will stop this futile
argument and apologize for wasting our time: there's better ways to
solve such disagreement.


Regards,
    Lars

-- 
Teamlead Kernel, SuSE Labs, Research and Development
SUSE LINUX Products GmbH, GF: Markus Rex, HRB 16746 (AG Nürnberg)
"Experience is the name everyone gives to their mistakes." -- Oscar Wilde



More information about the Linux-HA-Dev mailing list