[Linux-ha-dev] Problems when DC node is STONITH'ed.
taniguchis at intellilink.co.jp
Tue Oct 14 04:07:00 MDT 2008
I found that there are 2 problems when DC node is STONITH'ed.
(1) STONITH operation is executed two times.
(2) Timeout-value which stonithd on DC node waits to reply
the result of STONITH op from other node is
always set to "stonith-timeout" in <cluster_property_set>.
The case (1):
(i) Stonithd on DC sends a request to stonithd on other node.
(ii) DC node is STONITH'ed!
(iii) Other node becomes DC.
(iv) No one can notify to new tengine that the STONITH succeeded.
(v) Transition timeout occurs on new tengine.
(vi) new tengine tries to STONITH again.
(vii) rebooted node (ex-DC node) is STONITH'ed again!
Is it the expected behavior?
(Maybe I think so, because target node must be STONITH'ed immediately,
and it can't wait for changing DC. Just to make sure.)
Just for reference, I attached logs.
The node named "stkdump2" is "ex-DC" and STONITH'ed.
The case (2):
When this timeout occurs on stonithd on DC
during non-DC node's stonithd tries to reset DC,
DC-stonithd will send a request to other node,
and two or more STONITH plugins are executed in parallel.
This is a troublesome problem.
The most suitable value as this timeout might be
the sum total of "stonith-timeout" of STONITH plugins on the node
which is going to receive the STONITH request from DC node, I think.
But DC node can't know that...
I would like to hear your opinions.
-------------- next part --------------
A non-text attachment was scrubbed...
Size: 90290 bytes
Desc: not available
Url : http://lists.community.tummy.com/pipermail/linux-ha-dev/attachments/20081014/60a431e2/hb_report.tar-0001.bin
More information about the Linux-HA-Dev