[Linux-ha-dev] Re: Difference between OCF_ERR_CONFIGURED and OCF_ERR_INSTALLED ?

Andrew Beekhof beekhof at gmail.com
Tue Jul 8 07:19:04 MDT 2008


On Tue, Jul 8, 2008 at 15:09, Andrew Beekhof <beekhof at gmail.com> wrote:
> On Tue, Jul 8, 2008 at 15:03, Andrew Beekhof <beekhof at gmail.com> wrote:
>> On Fri, Jul 4, 2008 at 16:52, Joe Bill <pica1dilly at yahoo.com> wrote:
>>>
>>>>--- On Fri, 7/4/08, Andrew Beekhof <beekhof at gmail.com> wrote:
>>>>> Exatcly how does heartbeat handle OCF_ERR_CONFIGURED and
>>>>> OCF_ERR_INSTALLED differently ?
>>>>
>>> >From some badly formatted and not-quite finished documentation:
>>>>
>>>>soft  = stop and retry
>>>>hard  = stop and retry - current node is excluded
>>>>fatal = stop - all nodes are excluded
>>>
>>> Taking the opportunity then that the documentation is not yet finished, I would like to make the following suggestions:
>>>
>>> - "soft" be changed to "error, unexpected"
>>>
>>> - "hard" be changed to "fatal, local" or "critical, local", or "fatal, node" or "critical, node" because we have diagnosed that the resource at fault is local to the node where it has been detected on
>>>
>>> - "fatal" be changed to "fatal, common" or "critical, common" or "fatal, cluster" or "critical, cluster" because we have diagnosed that the resource at fault is common to all nodes in the cluster.
>>>
>>>>5 The requested agent or tool required by the agent is
>>>> not installed. hard
>>>
>>> I believe "resource configuration" to be more appropriate here. HA shouldn't care at this point if it's a piece of software or local configuration file that is missing or screwed.
>>>
>>> add:
>>>
>>> - or the resource's local configuration,
>>> - or the node's specific configuration ... are invalid.
>>>
>>>>6 The resource's configuration is invalid. fatal
>>>
>>> I believe "instance configuration" to be more appropriate here,
>>>
>>> replace with:
>>>
>>> - the instance's configuration (common, shared, clusterwide resource configuration) is invalid,
>>> - or the resource agent has detected a severe internal (programming,code) error.
>>
>> makes sense
>>
>>>
>>>
>>> Regarding the mnemonics of the return codes...
>>>
>>> >From your notes above, it seems the status definitions appear to be more related to the restart and blocking effect the HA supervisor has on resources, than what the current mnemonics attempt to describe as situation.
>>>
>>> I am not sure it is such a good idea to attempt to combine a condition with the condition's handling action in the process of defining states that are to be reported to the supervisor.
>>
>> Not sure I follow this...
>>
>>>
>>> >From what you provided as description, is it i.e. the supervisor's concern, and will the supervisor attempt anything to address the cause, or for that matter do anything different if it receives any of the following status: OCF_ERR_UNIMPLEMENTED, OCF_ERR_PERM, OCF_ERR_INSTALLED ?
>>>
>>> Same question for OCF_ERR_ARGS and OCF_ERR_CONFIGURED ?
>>>
>>> Now the problem starts when I want to describe a condition where a resource needs an internal ( fixed name, not specified as resource parameter) file but file is missing on one host and not on others. Which condition would you choose ?
>>
>> OCF_ERR_ARGS i guess - since that would exclude the failed node but not the others.
>
> oops, args doesn't do this.
> probably OCF_ERR_INSTALLED then.  or maybe one of OCF_ERR_ARGS and
> OCF_ERR_CONFIGURED needs to be made fatal.

brain not working today... of course I meant "hard".  and having
looked at everything again, i think this is the right approach.
So from now on OCF_ERR_ARGS will be a "hard" error instead of a "fatal" one.


More information about the Linux-HA-Dev mailing list