[ENBD] [?gb2312?]

Peter T. Breuer ptb@oboe.it.uc3m.es
Tue, 9 Jan 2001 13:35:12 +0100 (MET)


"A month of sundays ago Wang Gang wrote:"
> Thank you! But I'm in China, so it's evening now.:) Are you in Spain?
 
  Yes, indeed.

> As for my second question, perhaps I didn't describe it clearly.  My
> meaning is : when nbd-server write data to a disk(not a file), first it
> delivery data to vfs, vfs will recognize that the destination is not a
> regular file, but a block device, so it call block_write(in
> fs/block_dev.c), block_write will alloc buffers for data, and it call
> ll_rw_block(in drivers/block/ll_rw_blk.c) to write data to disk
> immediately only if the O_SYNC flag is set.  Therefore when nbd-client

If you are not going through the file system (and hence cannot use the
chattr or mount options), you will have to rely on the raw device
semantics.

Under kernel 2.4.0 writes to /dev/hda are unbuffered. Writes to
/dev/hda1 (and so on) are buffered. I believe there are separate
patches for making partitions unbuffered.

Can anyone do the remaining forward port of the nbd driver to 2.4.0?
I believe that all one has to do is disable request merging in the
block device generic layer.

> get the replay, the data perhaps hasn't been write to disk.  If server
> is down before linux flush data to disk, data will lose.  Compared with

This is a server side issue. If the server is not synchronous to disk,
then indeed you can lose data on the server side.

> local disk, data being written to nbd device is buffered one more time.
> I think should ensure that the data was really on the device when
> nbd-cleint receive the reply.  How do you think?

It's impossible to guarantee in many circumstances.  I don't think
putting in a server-side cache (which I haven't done) will alter the
picture.  The cache buffers can be dirty but not uptodate.  That's
precisely what the cache's intended to cope with.  It will release dirty
but notuptodate buffers to the resource when it can, later, and then
mark itself as dirty but uptodate.  It acks the incoming write when it
itself is successfully dirtied, not when it is uptodate (I think!).  So
you can get a server crash when the data is cached but not on disk.
That's OK.  Being in the cache is enough.

What I _have_ done is put in a client-side cache. This has the same
properties as I talked about above.  It will hold data that it couldn't
write through to the server in normal situations, until it can write it
through.  I don't see that as a bad thing.  If you don't want to cache
on the clientside, you don't have to, and you may need to. The server
can put itself in readonly mode to hold off writes for a while, and
then the cache will save the writes until the server is writable again.

> >This is a very hard trick to do, but it can be done in several ways.
> >The easiest thing to do (which nbd does) is to generate a buffer
> >in user space and pass its address down into the kernel, with address
> >conversions. Look for the code in the driver that "registers a
> >buffer".

> I notice that you use copy_to_user, can kernel program don't use it, but
> access the memory directly?

This trick required the mmap stuff, when I tried it last. I was unable
to notice any speed improvement. Maybe I was not maxed out on bandwidth
anyway? It may be worth my while repeating the attempt (I think it was
done in something like enbd 2.4.3 as an experiment). What I didn't like
was that it was very difficult to know when to kill the kernel buffer.
One needed a separate buffer area for each client. I think I can do it
better now.


Peter