[Linux-ha-dev] sfex
Xinwei Hu
hxinwei at gmail.com
Thu Jun 19 07:26:13 MDT 2008
2008/6/19 Keisuke MORI <kskmori at intellilink.co.jp>:
> Hi,
>
> "Xinwei Hu" <hxinwei at gmail.com> writes:
>> I'm the one who opposed sfex in the previous discussion.
>>
>> My point was simple that:
>> """"
>> check-and-reserve on disk is not an atomic CAS operation. and lock
>> based on that may silently cause data corruption.
>> """
>
> sfex doest not rely on the atomicity of "check-and-reserve".
> It's always _overwriting_ the control data and the detection of
> losing the ownership is done by timeout based.
>
>
> Indeed it can happen that two nodes try to write the control
> data at a same time in a particular condition, but
>
> 1) Such situation will not happen on the scenario of the typical
> split-brain condition with sfex. It only can happen in a
> particular condition such as a miss-operation that trys to
> launch two nodes simultaneously _without_ fixing the
> split-brain condition.
>
> 2) Even if such situation had occured, sfex resolves it as follows;
> - sfex always writes its control data as "one sector" data
> (512 bytes in most of cases) through the direct I/O.
> That would be a single write request to the disk controller.
> - If two nodes tried to write the data at a same time,
> the request will be serialized in the disk controller, so
> 'the latter one' will win.
> - sfex makes sure that the written data is "mine" and
> the "loser" will return an error to prevent from lauching resources.
>
>
>
> Does it explain to you?
No.
Your basic assumption is that sfex can run in a deterministic
environment. Right ?
I think so because sfex totally relies on predicable execution time.
But Linux (for example) indeed is not such an environment, as the
process can be scheduled out at _any_ point for _any_ time.
And this is an essential problem due to the lack of CAS operation for disk.
btw: dskcm is lockless because of the same problem.
> Thanks,
>
>
>>
>> I haven't follow the evolution of sfex though, so things might have
>> been changed.
>>
>> Just FYI.
>>
>> 2008/6/17 Dejan Muhamedagic <dejanmm at fastmail.fm>:
>>> Hello,
>>>
>>> Since last year NTT designed and implemented sfex, a suite of
>>> programs to improve shared disk usage (see linux-ha.org/sfex)
>>> which unfortunately didn't attract attention it deserves. I
>>> reviewed the code and attached you'll find some comments and some
>>> simple changes. One general remark: all programs (sfex_*) are
>>> monolithic and, though they are not that big, it would be
>>> beneficial to code readers if they were split into more
>>> units/functions.
>>>
>>> A couple of suggestions on making sfex useful in other contexts
>>> were making a quorum plugin and a HBcomm plugin. Did you
>>> investigate further these options?
>>>
>>> Of course, if you agree, we could include sfex into the heartbeat
>>> repository.
>>>
>>> Cheers,
>>>
>>> Dejan
>>> _______________________________________________________
>>> Linux-HA-Dev: Linux-HA-Dev at lists.linux-ha.org
>>> http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
>>> Home Page: http://linux-ha.org/
>>>
>> _______________________________________________________
>> Linux-HA-Dev: Linux-HA-Dev at lists.linux-ha.org
>> http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
>> Home Page: http://linux-ha.org/
>
> --
> Keisuke MORI
> NTT DATA Intellilink Corporation
>
> _______________________________________________________
> Linux-HA-Dev: Linux-HA-Dev at lists.linux-ha.org
> http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
> Home Page: http://linux-ha.org/
>
More information about the Linux-HA-Dev
mailing list