hot-standby services: WAS How do I set up a resource as "secondary
only"?
David Lang
david.lang@digitalinsight.com
Tue, 15 Oct 2002 17:36:42 -0700 (PDT)
I am missing why this isn't possible with the current heartbeat code =
(I'm
basicly doing this with my firewalls)
it would take a little more smarts in the resource scripts, but you =
have
the following transitions.
1. boot -> standby
run from init scrips before starting hearbeat
2. standby -> active
resource scripts perform the change when run with the 'start' =
parameter
(a more robust resource script would check to see if it's running or =
not
and if not get it into standby mode before upgrading it to active)
3. active -> standby
resource scrips perform the change when run with the 'stop' parameter
4. standby -> shutdown
performed by the system shutdown scripts
Status is a possible issue, but from what I have seen of the status =
return
values in scripts it doesn't seem to be used for much of anything.
canprimary doesn't make sense to me, we only failover when we have no
choice, we are down anyway.
can secondary makes a little sense if you don't want to run =
nice_failback,
but realisticly I would think that if a system is that critical you =
will
have your failover box be equal to the primary in terms of capacity (in
which case teh only reason to not run nice_failback is if you are =
runnign
two sets of resources and haveing two boxes being failover for each =
other,
and in that case you don't want to use the standby feature anyway)
David Lang
On Tue, 15 Oct 2002, Alan Robertson wrote:
> Date: Tue, 15 Oct 2002 14:55:23 -0600
> From: Alan Robertson <alanr@unix.sh>
> To: Lars Marowsky-Bree <lmb@suse.de>, linux-ha@muc.de,
> Ragnar Kj=F8rstad <ragnar@bigstorage.com>
> Cc: Phillip Reisner <philipp.reisner@linbit.com>
> Subject: hot-standby services: WAS How do I set up a resource as
> "secondary only"?
>
> This discussion is slightly academic, but will hopefully be somewhat =
helpful.
>
> We currently implement failover services in linux-ha.
>
> I define the term failover service to be a "cold standby" =
application. That
> is, the service runs on exactly one machine at a time.
>
> There is a second kind of service that one can define (telecom folks =
do
> this) called hot-standby services.
>
> A hot standby service tries to always run on every "eligible" node in =
the
> cluster at once. But it does not run in the same mode on every node =
in the
> cluster.
>
> One node is called the active node, and the other nodes are called =
the
> standby nodes.
>
> Such services are started in standby mode when the cluster service =
comes up.
> Rather than ensuring that no more than one node in the cluster is =
running
> the service, the cluster manager must ensure that no more than one =
node is
> running as primary for the given service.
>
> Instead of stopping the service, the cluster manager switches the =
service to
> standby.
>
> The operations on such a resource might be:
>
> standby - begin service in / switch service to standby mode
> active - begin service in / switch service to active mode
> stop - stop running service completely
> status - return active, standby or stopped
> monitor - is service running correctly in the current mode
> canprimary - can this service become primary now?
> cansecondary - can this service become secondary now?
>
> As I understand it, such a service would meet Nathan's need. It =
would also
> work nicely for DRBD, rsync, and other replication-type services.
>
> I believe that such services are useful, and if implemented this way =
could
> be completely understood by the cluster manager, and not be a kludge =
or
> mystery of any kind as it is now.
>
> LMB, Ragnar: I think we should consider this model for the OCF.
>
> -- Alan Robertson
> alanr@unix.sh
>