[ENBD] Compiling under AMD64 fails
Peter T. Breuer
ptb at inv.it.uc3m.es
Thu Sep 20 05:33:04 MDT 2007
"Also sprach Dr. Volker Jaenisch:"
> 2.4.34pre compiles flawlessly under AMD64.
Actually, now I'm aware of the situation, I'll carefully back port the
minor changes required for just compilation on amd64 to the stable
2.4.33. For cautious people.
> Maybe you should remove the "pre" suffix to encourage precautious people
> to use it. :-)
> We will share our experiances with endb.
Be aware that I had to work on other things (hard input for static
annalysis of kernel C) for a couple of years and am only now getting
back into the swing of thigs with enbd and raid. I talked to Neil Brown
(the kernel soft raid maintainer) and I believe we are agreed that a
special "slow mode" is needed for raid where one component is a network
device like enbd, and I will work in the coming weeks or months to
provide it.
If I ever get a millisecond to actually do _anything_.
The problem to be solved is what I explained to you in another mail ...
... under write pressure to a network device the linux kernel will
fill all buffers with dirty stuff to go to the net, but then won't
have any memory to spare to form tcp buffers with which it can
actually send it out!
Under very recent kernels (as I tested a few weeks ago) it doesn't seem
to be a complete deadlock. I was favourably impressed by the improved
memory handling of recent kernels.
One problem with older kernels is that the thread that scavenges memory
when memory is scarce chooses devices to presure to flush their dirty
buffers to disk without any science behind it, and it might have chosen
enbd, which would not be able to flush because it needs tcp buffers,
which aren't available because all buffers are tied up with dirty data
to go to enbd .. loop.
Maybe one can now prededicate kernel memory to certain network sockets?
That would cure that. FOr a long time there was a kernel discussion
about that. The problem is that one can need infinite memory (well, a
lot) for tcp buffers on 64K sockets, and nobody wanted to tie that up.
Also one could use udp instead of tcp. I think drbd does that .. if I am
wrong, then drbd throttles itself when it spots memory is getting tight,
to avoid going into deadlock (not that I really believe it can do that
calculation successfully always - I don't see the kernel hooks to make
it really possible).
Failing packets internally and then retrying would be an option. It
would amount to self-throttling. I believe I went for something like
that in 2.4.34 .. I allowed packets on the net to time out, but didn't
fail them until three retries had been lost. I suspect I will have to
raise the retry number to nearer infinite as a best strategy.
Anyway, in raid, the problem is that if one doesn't fail network packets
under write stress and time to complete being exceeded, then the whole
raid device can look blocked. Raid (1) will go async, but only up to a
certain number of blocks. That helps. However, it's not an infinite
number of blocks that can be write-behinded in raid. And you MUST
set write-behind on the raid device for the enbd component, which is
not easy to find out how to do!
Even then, raid is too sensitive to errors that are purely temporary and
can be recovered. It should run a recovery thread for requests that
have been expeditiously errored by the network device under pressure,
and it doesn't, preferring to eject the device from raid instead. One
pretty well has to run a "reinsert daemon" on raid to put the network
device back in again.
Those are the problems I'll tackle in the near future. They aren't
difficult problems to solve, but I haven't had time yet and they would
make things look as though "it doesn't work" if you don't understand
what is going on. If you like I can work with you to set the appropriate
params on enbd and raid. And I can maybe do that infinite retries thing
in code now that I have expressed it. There's a number 3 written down
somewhere that needs to be changed.
Peter
More information about the ENBD
mailing list