[Linux-ha-dev] Comments on the cib-schema.txt
lmb at suse.de
Fri Jan 16 04:07:36 MST 2004
thanks to Andrew for writing up the description of the CIB.
I agree with most of it, but that's not fun to discuss, so lets start
with the differences ;-)
- The version string + timestamp + flag _are_ the generation counter.
(Though I don't see the point in having a flag for whether it was
generated or read from a file - where is that used?)
- The id field is a UUID.
- We need to agree on the timestamp granularity; second is good, ms
timestamps would be even better.
I'm wary of timestamps though; we need to make clear that we strongly
suggest that the cluster is running NTP to sync the times to <1s.
However, wherever possible, we should use generation counters instead,
based on the membership serial no, the update counter to that
particular field etc, and mostly use the timestamp as an informational
- The paragraph regarding the max_instances, op_timeout, priority field
should go into a specific section about resource data description. It
doesn't belong into the overview section, and discusses the resource
description and behaviour of the PE in too much detail; to the CIB,
the resource data is opaque.
- Same for the node / constraints / dependency sections. The CIB doesn't
mess with them.
- Note that the resource & constraint sections are 'static', ie admin
controlled only; they are the configuration data, and thus are the
ones to be tagged with a combined generation counter and then the
CIB can identify the latest one available in the current cluster.
This is different from the 'status' section, which is dynamic and
merges the status data from the nodes as of now.
- The matter of debate what the status information is (and whether it
contains the node status information to a certain extend) is one I
want to touch on too.
What we need to keep in the CIB is the result of STONITH operations.
ie, if node A has been successfully able to reset node B, and we
haven't heard from the node B since, we can assume it is still cleanly
fenced. And we also need to remember if node A has been
_unsuccessful_ in fencing B. So some node status needs to be kept in
However, this follows naturally if the STONITH operations are
effectively carried out by the LRM as explained in the other mail;
they'd still know, and the CIB status section would again become the
simple merge of the current LRM data.
On to the DTD.
Again, I think the description of the actual _content_ of these elements
should not be part of the overall layout. They are opaque to the CIB.
Here is my sample for the CIB, as I'd imagine it:
<configuration generation_counter="12345" timestamp="10600000.000"
<!-- List of node attributes, resources, dependencies. CIB
doesn't care. -->
<node id="nodea" status="up" timestamp="...">
<!-- Potentially more information regarding node health or
whatever. Again, the CIB shouldn't care. -->
<lrm node_id="nodea" timestamp="...">
<!-- Once more, the CIB doesn't need to know what's inside here.
I imagine it would contain the currently active resources
on the LRM on nodea, their status and a 'history' of which
resources have failed to startup on that node etc, as well
as the node status -->
<node id="nodeb" status="down" timestamp="..." />
<node id="nodec" status="failed" timestamp="..." />
The job of the cibd is to retrieve the most recent configuration part
(as identified by the generation counter thingy), retrieve the current
status from each node alive and potentially write an empty record for
each node not present; this is then passed into the Policy Engine and
also redistributed to all nodes (to spread the most current
configuration around for redundancy).
Lars Marowsky-Brée <lmb at suse.de>
High Availability & Clustering \ ever tried. ever failed. no matter.
SUSE Labs | try again. fail again. fail better.
Research & Development, SUSE LINUX AG \ -- Samuel Beckett
More information about the Linux-HA-Dev