[Linux-ha-dev] adding/deleting nodes from the system

bmartin@penguincomputing.com bmartin@penguincomputing.com
Wed, 7 Mar 2001 22:38:51 -0800


On Wed, Mar 07, 2001 at 07:40:09PM -0700, Alan Robertson wrote:
> bmartin@penguincomputing.com wrote:
> > 
> > Ok,
> > 
> > So I've hit another dillema ;(
> > 
> > This time I was trying to get my heartbeat client to add / delete nodes
> > from the system.
> > 
> > My initial approach was to have my client rewrite the ha.cf on each
> > machine and then call heartbeat -r.  For some reason I thought that
> > my client would be able to stay connected across a reaload and that
> > everything would be hunkey dorey :)
> > 
> > But alas, this is not the case.  After some investigation, it seems that
> > this approach is a really bad idea, for a number of reasons.
> > 
> > So I was racking my (now better functioning) brain for another solution
> > and I happened across the add_node() func in config.c.
> > 
> > It seems this is exactly what I need, except there is no external interface.
> > 
> > Now maybe I'm trying to have a client do something that clients shouldn't do.
> > 
> > I can see how having an interface for clients to add (and hopefully remove)
> > nodes could really wreak havoc.
> > 
> > It is also one of the intended features of my software, so I need to figure
> > out what do do about it.
> > 
> > So what do people think.  Should clients be able to mess with the cluster
> > in this respect?  Certainly it would be reasonable to restrict this to named
> > clients, as well as possibly have some field in /etc/ha.d/ha.cf that lists
> > what named clients are able to perform such 'dangerous' actions.
> > 
> > I don't mind implementing this, but wanted to run the idea past the list
> > first to see what people think.
> 
> Several things come to mind...
> 
> First, the current heartbeat cluster management code gets confused by more
> than two nodes in the cluster.  If you're planning on rolling your own
> cluster manager, this is no problem.  Caveat emptor!
> 
Well, yes, I falied to mention my hacks to the ResourceManager and 
mach_down scripts. 

Basically I changed the haresources to have a standby node listed and 
then when mach_down is called only the satndby will pick up the resources.

Crude but effective.  

> All further discussion presumes that this has somehow been dealt with...
> 
> Second, your idea isn't necessarily a bad one, but I've been thinking about
> the whole thing off and on and have several related thoughts I'll go into
> below...
> 
> The "dangerous" aspect could be dealt with by saying such operations are
> only available to certain client names as you said, or to certain user ids,
> or whatever.
> 
> We could just let anyone who knows the shared secret join the cluster, and
> eliminate the administration of node names altogether.  You'll see some
> vestigal code for that purpose in the base.  It's #ifdefed out right now.
> 
Yea, I saw that.  It would seem that the only reason not to do this would
be security.  And once the secret has been comprimised, we've already said
goodbye to that :)

> Regarding having clients change configuration...
> 
> I believe that the configuration module should either be a loadable module
> like lots of the others or be controlled by a client, or some combination of
> the two.
> 
> A loadable module would certainly be flexible... Then you could use
> duplicated flat configuration files like now, or a distributed database, or
> a file on a shared filesystem, or...
> 
This sounds like a very good idea, but I think clients should be able to 
muck things up too :) 

> Client configuration also has advantages.  It would allow it to be
> arbitrarily flexible.  However, certain things need to be known from the
> beginning (probably via command line), or you can't bootstrap.  Things like
> knowing the network topology are probably best gotten via a loadable
> module.  But things like knowing the resource configuration may be best done
> by the client.
> 
Yes I agree.  

So is a machine part of the network topology or a resource?

Along these lines, I have managed to do a bit of a dance around the 
current ResourceManager script to allow me to change the resources on 
the fly, through explicitly telling heartbeat to grab or release
the resources at the right time.  

Now this too is a hack that is meant for the time being.

If there were a set of extentions to the heartbeat client api that 
allowed clients to add/remove nodes from the system, and also add/remove
resources from the system, I wouldn't need such hackery.

> Another option would be to have a named client register to give permission
> for nodes to join the cluster.  Then a client would have to approve any
> given node joining the cluster before it was allowed to join.
> 
This would be good for the really paranoid types.

> I tend towards just letting any node that knows the shared secret and is
> talking on our network/port join the cluster.  This is *way* simple. 
> Simplicity is a great virtue when one looks at very inexpensive clusters. 
> Inexpensive clusters don't generally have expensive sysadmins running them.
> 
Yes I think this is a good idea as well.  Is there really any reason not to?
Like I said above, if someone has the shared secret, it's all over anyways.

I suppose with the current ResourceManager in place, adding another node
can really break things.  

But that's the only reason I can see for this being a bad idea.    

> Of course, one can always have a flag which says whether this behavior is
> permitted.
> 
> By the way, it is probably the case that the clients ought to get informed
> when heartbeat restarts, so they can reconnect and restart their interface. 
> It might even be the case that they should ACK this message before the
> restart is allowed to proceed.  This is generally a good idea, as Brian can
> probably attest.
> 
Yes. This would be good.  Although just so you know, I'm not planning any
tricks invloving restarting anymore :)

Brian

BTW - I won't break anything if I compile with MITJA defined, will I?