[ENBD] 2.4.15 (thread safe)
Peter T. Breuer
ptb@it.uc3m.es
Tue, 10 Oct 2000 16:29:37 +0200 (MET DST)
While I am replying to mail ...
Bug: It seems that 2.4.14 was not thread safe in the new hashmap caching
code. (The old bitmap code underneath was alright).
How evoked: You would have seen an error if you tried caching clientside
with the default options with more than one client daemon/connection
running to the server. A write to the device spread across both
daemons wold have resulted in corruption visible next session using
the same cache.
Fix: Use -jb or new code. I've corrected the error in the 2.4.15
snapshot currently (secretly) available on my source directory at
http://www.it.uc3m.es/ptb/nbd/src/
Detail: I've been reworking the code to get even more modularity and saw
the error as soon as the code became transparant enough. The count of
hash entries was kept in memory, not in the shared mmap/on disk. Thus
as soon as different daemons had to invent a new space to write a
new hash entry in, they trod on each other. I should have tested
first to see if the space looked clean, but didn't. This was a metadata
problem, not (immediately) a data problem. It only arises when there is
metadata, so the bitmap cache underneath is OK (option -jb).
The cache code is now organized as follows:
lines file description
286 lowlevel.c shared mmap or file access
120 bm.c raw bitmap. Metadata only. test, set, reset etc.
268 bitmap.c bitmap plus data area. Full caching interface
428 hm.c raw hashmap. Metadata only. newentry, nextentry, etc.
362 hash.c hashmap plus bitmapped data. Full caching interface.
279 cache.c selects either hashmap or bitmap below it.
1743 total
The hashmap code has been as complex as all the rest together. That's
why it's taking me so long. I am still studying locking in the hashmap
(I don't think it's fully threadsafe yet). Locking in the bitmap cache
is just fine.
I'll be adding disconnected operation in a jiffy, just as soon as I
can.
When should I sync the cache to disk on the client? msync is only
called when the daemons die at present. That means that one loses
all of a session from the cache if the machine crashes before it
can msync. I don't see any natural sync points other than
individual transactions. (((I can call msync async, but doesn't that
defeat the point?).
Peter