hxinwei at gmail.com
Thu Jun 19 07:26:13 MDT 2008
2008/6/19 Keisuke MORI <kskmori at intellilink.co.jp>:
> "Xinwei Hu" <hxinwei at gmail.com> writes:
>> I'm the one who opposed sfex in the previous discussion.
>> My point was simple that:
>> check-and-reserve on disk is not an atomic CAS operation. and lock
>> based on that may silently cause data corruption.
> sfex doest not rely on the atomicity of "check-and-reserve".
> It's always _overwriting_ the control data and the detection of
> losing the ownership is done by timeout based.
> Indeed it can happen that two nodes try to write the control
> data at a same time in a particular condition, but
> 1) Such situation will not happen on the scenario of the typical
> split-brain condition with sfex. It only can happen in a
> particular condition such as a miss-operation that trys to
> launch two nodes simultaneously _without_ fixing the
> split-brain condition.
> 2) Even if such situation had occured, sfex resolves it as follows;
> - sfex always writes its control data as "one sector" data
> (512 bytes in most of cases) through the direct I/O.
> That would be a single write request to the disk controller.
> - If two nodes tried to write the data at a same time,
> the request will be serialized in the disk controller, so
> 'the latter one' will win.
> - sfex makes sure that the written data is "mine" and
> the "loser" will return an error to prevent from lauching resources.
> Does it explain to you?
Your basic assumption is that sfex can run in a deterministic
environment. Right ?
I think so because sfex totally relies on predicable execution time.
But Linux (for example) indeed is not such an environment, as the
process can be scheduled out at _any_ point for _any_ time.
And this is an essential problem due to the lack of CAS operation for disk.
btw: dskcm is lockless because of the same problem.
>> I haven't follow the evolution of sfex though, so things might have
>> been changed.
>> Just FYI.
>> 2008/6/17 Dejan Muhamedagic <dejanmm at fastmail.fm>:
>>> Since last year NTT designed and implemented sfex, a suite of
>>> programs to improve shared disk usage (see linux-ha.org/sfex)
>>> which unfortunately didn't attract attention it deserves. I
>>> reviewed the code and attached you'll find some comments and some
>>> simple changes. One general remark: all programs (sfex_*) are
>>> monolithic and, though they are not that big, it would be
>>> beneficial to code readers if they were split into more
>>> A couple of suggestions on making sfex useful in other contexts
>>> were making a quorum plugin and a HBcomm plugin. Did you
>>> investigate further these options?
>>> Of course, if you agree, we could include sfex into the heartbeat
>>> Linux-HA-Dev: Linux-HA-Dev at lists.linux-ha.org
>>> Home Page: http://linux-ha.org/
>> Linux-HA-Dev: Linux-HA-Dev at lists.linux-ha.org
>> Home Page: http://linux-ha.org/
> Keisuke MORI
> NTT DATA Intellilink Corporation
> Linux-HA-Dev: Linux-HA-Dev at lists.linux-ha.org
> Home Page: http://linux-ha.org/
More information about the Linux-HA-Dev