[LinuxFailSafe] FailSafe Activity - what's happening with FailSafe today?
Lars Marowsky-Bree
lmb@suse.de
Tue, 28 Jan 2003 12:12:40 +0100
On 2003-01-22T19:12:15,
Kashif Shaikh <kshaikh@consensys.com> said:
> Nope. Failsafe needs a lot of improvements. One of the most apparent
> problems I have found is failsafe complexity -- it's 300,000+ lines of
> code(not counting the GUI component) with 7 daemons * number of nodes
> trying to communicate with each other. As you can imagine, the smallest
> problems quickly propagate to all the subsystems causing things to
> malfunction.
It also makes it difficult to understand, because even if you have some of the
design documents you'd find that there is a certain gap between the whiteboard
design and the actual code base. Refactoring the FailSafe code base is
something I personally deem nearly impossible without active collaboration
with SGI, and well, this isn't happening for a variety of reasons ;)
Our - SuSE's - decision has thus been to jump into cold water and rescue the
best of FailSafe ideas, of which there are indeed plenty - the cdbd is a
magnificient piece of code and functionality - and rebuild them on top of
heartbeat, which seems to have received more community attention in the past
and present.
I've in fact posted a design proposal for such a reworked Resource/Recovery
Manager to the linux-ha-dev list two weeks ago; people familiar with FailSafe
will notice a few similarities.
Sincerely,
Lars Marowsky-Brée <lmb@suse.de>
--
Principal Squirrel
SuSE Labs - Research & Development, SuSE Linux AG
"If anything can go wrong, it will." "Chance favors the prepared (mind)."
-- Capt. Edward A. Murphy -- Louis Pasteur