[LinuxFailSafe] FailSafe Activity - what's happening with
FailSafe today?
Kashif Shaikh
kshaikh@consensys.com
22 Jan 2003 19:12:15 -0500
On Mon, 2003-01-20 at 19:22, JW wrote:
>
> Was it really completely ported? There was nothing left to do?
> I find that a little hard to belive. Is there no one interested in "improving"
> upon what's already available? Perhaps it's a perfect system and needs no
> updates? ;-)
Nope. Failsafe needs a lot of improvements. One of the most apparent
problems I have found is failsafe complexity -- it's 300,000+ lines of
code(not counting the GUI component) with 7 daemons * number of nodes
trying to communicate with each other. As you can imagine, the smallest
problems quickly propagate to all the subsystems causing things to
malfunction.
But then again -- nothing like failsafe is out there. It has a uniform
replicated database, cluster membership manager, group process
membership and communication, a very decent GUI, lot of parts are very
modular, etc.
Of course since it's open source it can be improved. Things that I
would like to improve(off the top of my head):
- removing the cruft between the bootstrap configuration process and the
actual ha daemons. In other words, move the cdbd higher up so it can
use membership layer from cms and use gcs for communication. Gcs can
then be modified to use hardware mcasts for efficient reliable, ordered
messaging. And place a SQL engine with perl access API for cdbd, so
that a simple query doesn't take 100 lines of code.
- test cases/regression testing - i.e. more integration testing needs to
be done to prevent errors rippling through the system.
- making the cdbd an actual linux filesystem(not so easy to do...ever
head of "distributed locking"?) This would help us poor folks trying to
keep password databases in sync and 200-option samba configuration file
consistent ;)
I dunno, seems to me there are a lot of things that can be improved :)
Kashif
>