[Linux-ha-dev] Comments on the cib-schema.txt

Lars Marowsky-Bree lmb at suse.de
Fri Jan 16 04:07:36 MST 2004


Morning all,

thanks to Andrew for writing up the description of the CIB.

I agree with most of it, but that's not fun to discuss, so lets start
with the differences ;-)

- The version string + timestamp + flag _are_ the generation counter.

  (Though I don't see the point in having a flag for whether it was
  generated or read from a file - where is that used?)

- The id field is a UUID.

- We need to agree on the timestamp granularity; second is good, ms
  timestamps would be even better.

  I'm wary of timestamps though; we need to make clear that we strongly
  suggest that the cluster is running NTP to sync the times to <1s.
  However, wherever possible, we should use generation counters instead,
  based on the membership serial no, the update counter to that
  particular field etc, and mostly use the timestamp as an informational
  field.

- The paragraph regarding the max_instances, op_timeout, priority field
  should go into a specific section about resource data description. It
  doesn't belong into the overview section, and discusses the resource
  description and behaviour of the PE in too much detail; to the CIB,
  the resource data is opaque.

- Same for the node / constraints / dependency sections. The CIB doesn't
  mess with them.

- Note that the resource & constraint sections are 'static', ie admin
  controlled only; they are the configuration data, and thus are the
  ones to be tagged with a combined generation counter and then the
  CIB can identify the latest one available in the current cluster.
  
  This is different from the 'status' section, which is dynamic and
  merges the status data from the nodes as of now.
  
- The matter of debate what the status information is (and whether it
  contains the node status information to a certain extend) is one I
  want to touch on too.

  What we need to keep in the CIB is the result of STONITH operations.
  ie, if node A has been successfully able to reset node B, and we
  haven't heard from the node B since, we can assume it is still cleanly
  fenced. And we also need to remember if node A has been
  _unsuccessful_ in fencing B. So some node status needs to be kept in
  the CIB.

  However, this follows naturally if the STONITH operations are
  effectively carried out by the LRM as explained in the other mail;
  they'd still know, and the CIB status section would again become the
  simple merge of the current LRM data.

On to the DTD.

Again, I think the description of the actual _content_ of these elements
should not be part of the overall layout. They are opaque to the CIB.

Here is my sample for the CIB, as I'd imagine it:

<cib timestamp="1074250671.000">
<configuration generation_counter="12345" timestamp="10600000.000"
	retrieved_from="nodeb">
	<!-- List of node attributes, resources, dependencies. CIB
	    doesn't care. -->
</configuration>
<status timestamp="1074250671.000">
<node id="nodea" status="up" timestamp="...">
	<health>
	<!-- Potentially more information regarding node health or
		whatever. Again, the CIB shouldn't care. -->
	</health>
	<lrm node_id="nodea" timestamp="...">
	<!-- Once more, the CIB doesn't need to know what's inside here. 
	     I imagine it would contain the currently active resources
	     on the LRM on nodea, their status and a 'history' of which
	     resources have failed to startup on that node etc, as well
	     as the node status -->
	</lrm>
</node>
<node id="nodeb" status="down" timestamp="..." />
<node id="nodec" status="failed" timestamp="..." />
...
</status>
</cib>

The job of the cibd is to retrieve the most recent configuration part
(as identified by the generation counter thingy), retrieve the current
status from each node alive and potentially write an empty record for
each node not present; this is then passed into the Policy Engine and
also redistributed to all nodes (to spread the most current
configuration around for redundancy).


Sincerely,
    Lars Marowsky-Brée <lmb at suse.de>

-- 
High Availability & Clustering	      \ ever tried. ever failed. no matter.
SUSE Labs			      | try again. fail again. fail better.
Research & Development, SUSE LINUX AG \ 	-- Samuel Beckett



More information about the Linux-HA-Dev mailing list