[Linux-ha-dev] RFC: "migrate" semantics
sergeyfd at gmail.com
Wed Oct 4 10:13:42 MDT 2006
1. XEN supports 2 kinds of migration. You are talking about "live"
migration. Default one for XEN (at least in the current official
3.0.2) isn't live.
2. XEN resources are atomic. I mean virtual machine is just one
resource that is concidered as a whole. In case of Heartbeat resource
can be and in most cases is a complex structure: IP Address +
DiskDevice + FileSystem + DataBase, etc... That complicates "migrate"
idea a lot. Having an ability to move just IP address in that group
doesn't give you a lot.
On 10/4/06, Lars Marowsky-Bree <lmb at suse.de> wrote:
> some food for feedback. Primary goal is obviously so support Xen, but
> this should be applicable to other containers et cetera as well.
> 1. Problem statement
> To migrate a resource from one node (source) to another (target),
> without service interruption, to enhance resource allocation
> This is to be achieved in a way resilient to failures of one of the
> involved nodes and/or service during the operation.
> A migrate differs from a stop-start cycle in so far as the service is
> never down in-between; this is still necessary in case of failures.
> 2. Pre-requisites
> - A resource shall be assumed to support "migrate" if it has a "migrate"
> operation defined.
> - Any mandatory collocation constraints it is part of must be clones
> running on both nodes.
> 3.1. Usual process
> - Constraint change requests that a resource is migrated from its
> current location. (crm_resource -M, for example.)
> - Notifications are issued, if configured.
> - The "migrate" command is run on both the source and the target.
> (As soon as one of the two LRMs has received this action, the
> operation is recorded as in progress and can be recovered if we find a
> pending migration. If the failure occurs before this step, the
> command will have to be given again.)
> - After the "migrate" has returned successfully on both, the resource
> shall be considered running on the target.
> - If both sides return with a "nothing changed" return code, the
> resource shall be considered active and healthy on the source still.
> On the target, though, this should be considered a "start" failure.
> (This is necessary if a resource does not support migrate, or if the
> target does not have adequate resources to complete the request.)
> - Any other result shall invoke recovery.
> 3.2. Recovery
> Proposal A: In case any step of the migration fails, we re-probe the
> resource on both the source and the target; then we can proceed as
> Proposal B: Stop the resource on both nodes.
> (Note: I've considered the case of N:M migrations between clones, in
> case one wants to scale down or up. It's an interesting idea for
> discussion, though.)
> High Availability & Clustering
> SUSE Labs, Research and Development
> SUSE LINUX Products GmbH - A Novell Business -- Charles Darwin
> "Ignorance more frequently begets confidence than does knowledge"
> Linux-HA-Dev: Linux-HA-Dev at lists.linux-ha.org
> Home Page: http://linux-ha.org/
More information about the Linux-HA-Dev