[ENBD] some questions

Peter T. Breuer ptb@oboe.it.uc3m.es
Tue, 9 Jan 2001 09:20:52 +0100 (MET)


"A month of sundays ago Wang Gang wrote:"
> Hi, Peter. I have some questions to ask you.

Hi ... happy new millenium and good morning to you.

> 1. I read the source code of 2.4.15 recently. I found the main
> difference between 2.4.15 and 
> 2.2.29 is "journal". I think "journal" is virtually a "disk cache".
> Could you explain it in detail? :)

I am up to 2.4.18 here. I really wanted to release many of the
intermediates publicly, but I haven't quite achieved my main objective
yet.  Yes, the difference is a cache.  At the moment it sits
client-side and absorbs writes, allowing a readonly
server to become readwrite from the point of view of the client.

There are various other cacheing modes, and I am working to make
the cache into a separate utility that will alllow it to sit either
clientside or serverside. When clientside, it will look like a
proxy for the server.

> 2. nbd-server was a user-space program, it must pass through file
> system to write disk. So 
> when nbd-server returned successfully, the data virtually hasn't been
> written to disk, it was still in 
> the buffer of server machine. If server crash at this time, the data
> will lose.  I think this 

Servers don't crash! But you can make the server file sychronous to
disk if you like (that's an ext2 option) or serve from a raw
partition.

> additional buffer is not good. How do you think? Do you konw some

The additional buffer is presently clientside. It's projected use 
serverside is to absorb writes for the real target while we make a
snapshot copy of the server resource. Then after the copy is completed
we turn the cache back to writethrough mode and let it flush out the
accumulated changes through to the real resource.

> method that can write 
> disk bypassing the file system(or simply add a sync)? In addition, can

I believe that "man chattr" and "man mount" ("sync" option) are the
usual directions given here. Also writing to /dev/hda (not /dev/hda1)
should be effective under kernel 2.4.0.

> kernel and a user-space program share memory?

This is a very hard trick to do, but it can be done in several ways.
The easiest thing to do (which nbd does) is to generate a buffer
in user space and pass its address down into the kernel, with address
conversions. Look for the code in the driver that "registers a buffer".

You can generate the buffer in the kernel. The standard trick is to
implement the mmap call on the device. Do it so that mmap causes the
kernel to make a buffer and pass the address back through the call.
The problem here is that you can lose the buffer in the kernel and
cause a memory leak.

> BTW, why the list was so quiet? :)

Christmas. And my laptop keyboard stopped working properly around new
year so I have been unable to do work for the last week. I think
I need to find a replacement in a hurry. I can't afford the "down"
time. Sigh.

Peter