[ENBD] enbd 2.4.27 going out the door ...

Peter T. Breuer enbd@lists.community.tummy.com
Sun, 3 Mar 2002 01:17:24 +0100 (MET)


I'll try and make the public release tomorrow. I've put the cvs
snapshot

     nbd-2.4.27.tgz

out on the ftp site

    ftp://oboe.it.uc3m.es/pub/Programs/nbd-2.4.27.tgz
  
and tomorrow I'll fix any problems with the packaging and announce it.

Over the last week, I

  a) restored working kernel write caching for applications with
    remote readonly root server and local writable "copy".  At least, I
    think it's working.  Not strongly tested but I had to understand the
    post-kernel-2.4.10 VM system first, and my head still hurts.

  b) found that the post-kernel-2.4.10 VM system CAN deadlock
    under low memory under ENBD, unless one tunes the VM system via
    /proc/sys/vm/bdflush.  The new VM starts _synchronously_ trying to
    flush buffers at 60% buffers dirty by default.  You must raise the
    trigger to at least 80% if you "only" have 128MB memory, because
    buffers will dirty to 60% before any of them can be flushed and then
    the VM will insist on blocking new buffer requests until some old
    ones have gone.  This is suicide since the flush involves tcp which
    needs buffers.  Raise the limit to 100%! What a silly default.

    I made the nbd-client adjust the VM automatically.  Experts (SuSE
    people?  access to andrea?) tell me if the dirty buffers sync
    trigger should go to 100%, please.  I also think the VM async flush
    trigger point should be dropped to 25% from the default 40%, so that
    buffers start to trickle out to the net early, instead of building
    up in silence.  This is what the vm settings should end up like.

nbd:/usr/oboe/ptb% cat /proc/sys/vm/bdflush 
25      0       0       0       200     3000    80      0       0
(^was 40%)                                       (^was 60%)

    If you don't understand that, you're not alone.  The new VM
    post-2.4.10 kernel is horribly "bursty", and that is bad for nbd,
    which wants to start pushing data out across the net as soon as it
    can, and not wait. If somebody knows how the other (nonzero)
    numbers there or elsewhere should be tuned to achieve that, tell
    me.  The 200(jiffies) is a buffer age until flush.  I probably
    lowered that too, but by hand, instead of building it into the
    code. 0% would not be too low.

  c) I didn't get time to fix any more of the fd or cdrom ioctls, so
  they'll have to go out as they are, working or not working, and
  people can tell me about it. There's probably 70% coverage.

  d) I checked compilation under the new 2.4.18 kernel, but didn't run
  it yet.

Peter