[ENBD] stability issues

Peter T. Breuer enbd@lists.community.tummy.com
Mon, 25 Mar 2002 14:15:00 +0100 (MET)


"A month of sundays ago Arne Wiebalck wrote:"
> > Strange, the test you did below suggests that VM is implicated. What
> > numbers did you use? And is this SMP or UP?
> >
> 
> I tried with 5% and 95%.
> The machines used are all smp.

OK.


I would really like you to try it with a UP kernel, but that's just to
help me. I appreciate that you have other priorities. A negative
result from a UP kernel would be a big red neon sign as to what kind of
problem it is.


> > > if the kernel stores only one request, why is memory usage increasing
> > > over time? I think I saw this on my machines.
> >
> > It shouldn't. But memory usage is very hard to measure. There are stats
> > that you can look at if you press something like alt-shift-numlock
> > (I forget exactly), but they requre a lot of interpretation.
> >
> > Some kind of _real_ memory leak would kill a machine over time. But as
> > I said, it's hard to imagine how it can come about from nbd, which does
> > the same thing again and again. What could leak? nbd doesn't do
> > kmallocs. Maybe it could lose buffers irretrievably? A fault in a
> > rarely used error path may do that.
> >
> > If there is a leak, then you should be able to detect it statistically.
> > That is to say, the memory "lost" should be proportional to the use of
> > the device. Reading or writing? I saw VM lose buffer heads on _read_.
> > That is to say, when the nbd driver tried to write incoming data to a
> > read request, it found the request didn't contain enough buffers
> > for the data.
> 
> I simply used top to have a look on mem usage. it is increasing with
> writing (0.5 M/s) and read (almost 3 M/s). I used nbd-test with the -y
> option.

Top is not meaningful at face value. Obviously cached memory should
increase, for example. To make sense of the memory figures one really
needs the output from that secret shift-numlock sequence that i can
never remember ...

> but the mem usage does not change one byte when reading from or writing to
> the raw device, so the vm system seems to allocate the memory.
> for what you ask? I dont know...

It's its job, I think.

> > Yes, I'll do it. According to you it just needs aligned buffers,
> > weird as it sounds!
> 
> At least the simple test program you provided succeeded, when I applied
> alignment corrections, so let's hope for the best :)

It works. Indeed, malloc alignment was not enough. 512B alignment
works.

> please let me know if you have adapted nbd-test for raw devices.

I have ... one needs to pass the size with -s for the moment, but that's
all. I added the code to cvs and it should be in the nbd-test.c in

   ftp://oboe.it.uc3m.es/pub/programs/nbd-2.4.28.tgz 

now. But it was just your kind of change.

What made you try 512B alignment?

  


Peter