[Linux-ha-dev] Re: Difference between OCF_ERR_CONFIGURED and
OCF_ERR_INSTALLED ?
Andrew Beekhof
beekhof at gmail.com
Tue Jul 8 07:03:34 MDT 2008
On Fri, Jul 4, 2008 at 16:52, Joe Bill <pica1dilly at yahoo.com> wrote:
>
>>--- On Fri, 7/4/08, Andrew Beekhof <beekhof at gmail.com> wrote:
>>> Exatcly how does heartbeat handle OCF_ERR_CONFIGURED and
>>> OCF_ERR_INSTALLED differently ?
>>
> >From some badly formatted and not-quite finished documentation:
>>
>>soft = stop and retry
>>hard = stop and retry - current node is excluded
>>fatal = stop - all nodes are excluded
>
> Taking the opportunity then that the documentation is not yet finished, I would like to make the following suggestions:
>
> - "soft" be changed to "error, unexpected"
>
> - "hard" be changed to "fatal, local" or "critical, local", or "fatal, node" or "critical, node" because we have diagnosed that the resource at fault is local to the node where it has been detected on
>
> - "fatal" be changed to "fatal, common" or "critical, common" or "fatal, cluster" or "critical, cluster" because we have diagnosed that the resource at fault is common to all nodes in the cluster.
>
>>5 The requested agent or tool required by the agent is
>> not installed. hard
>
> I believe "resource configuration" to be more appropriate here. HA shouldn't care at this point if it's a piece of software or local configuration file that is missing or screwed.
>
> add:
>
> - or the resource's local configuration,
> - or the node's specific configuration ... are invalid.
>
>>6 The resource's configuration is invalid. fatal
>
> I believe "instance configuration" to be more appropriate here,
>
> replace with:
>
> - the instance's configuration (common, shared, clusterwide resource configuration) is invalid,
> - or the resource agent has detected a severe internal (programming,code) error.
makes sense
>
>
> Regarding the mnemonics of the return codes...
>
> >From your notes above, it seems the status definitions appear to be more related to the restart and blocking effect the HA supervisor has on resources, than what the current mnemonics attempt to describe as situation.
>
> I am not sure it is such a good idea to attempt to combine a condition with the condition's handling action in the process of defining states that are to be reported to the supervisor.
Not sure I follow this...
>
> >From what you provided as description, is it i.e. the supervisor's concern, and will the supervisor attempt anything to address the cause, or for that matter do anything different if it receives any of the following status: OCF_ERR_UNIMPLEMENTED, OCF_ERR_PERM, OCF_ERR_INSTALLED ?
>
> Same question for OCF_ERR_ARGS and OCF_ERR_CONFIGURED ?
>
> Now the problem starts when I want to describe a condition where a resource needs an internal ( fixed name, not specified as resource parameter) file but file is missing on one host and not on others. Which condition would you choose ?
OCF_ERR_ARGS i guess - since that would exclude the failed node but
not the others.
if the file isn't available anywhere, then the resource will be tried
once on each node and give up.
> Then the situation where a filename is specified as resource parameter but that file does not exist on one host. Is it an OCF_ERR_INSTALLED error, or a OCF_ERR_CONFIGURED error, why not an OCF_ERR_ARGS ? Can I even diagnose a OCF_ERR_ARGS when running the resource agent on only one node if that file DOES exist on other nodes ? How is that resource agent going to check on the another nodes and see that the file does exist there ?
why would you try and do this? just let it fail once on each node.
OCF_ERR_CONFIGURED should only be used when the inputs are so bad that
the resource wont be able to run anywhere (ie. "file" is mandatory but
no value was specified)
More information about the Linux-HA-Dev
mailing list