[ENBD] Re: [DRBD-dev] A comparison of ENBD and DRBD, updated
Peter T. Breuer
ptb@it.uc3m.es
Mon, 23 Oct 2000 01:54:33 +0200 (MET DST)
"A month of sundays ago Marcelo Tosatti wrote:"
> You must apply two patches to fix this problem. One of them is against the
> kernel (kernel-2.2.17-tcpalloc.patch) and the other one is against drbd
> (kernel-2.2.17-drbd-deadlock.patch).
> You can find them both at
> http://bazar.conectiva.com.br/~marcelo/ha/patches/
The kernel patch is interesting (a one liner that alters the priority
with which the network stack grabs memory). Does it do anything? I
would have thought it just hides the symptoms a bit.
On a fast machine and a slow net (the situation everyone is in)
write pressure on a network file system will fill kernel
buffers faster than they can be unloaded to the net. Eventually
all ram will be full with buffers and then the kernel will deadlock
when it tries to get a little more temporary memory in order to do the
send that will release a buffer or two. The priority doesn't matter,
does it?
I don't think there's any foolproof way out of that situation at the
moment on linux, apart from Don't Do That Then. You can try syncing
every second in the background, or running on a synchronous FS. But
if your machine is infinitely fast it will still tend to build up
enormous buffer pressure if you stream to a buffered network FS and
then the vfs will crumble.
The problem is that the vfs doesn't know who's causing the buffer
pressure, so it calls on (say) NBD to free some buffers when it runs
out of space. But NBD needs some ram to send via the network and
causes more pressure, which ...
It seems to me that merely some randomness in the block r/w system
for freeing buffers would avoid deadlock, but there doesn't seem to
be any indeterminacy. What one has to do is free memory by asking
_other_ file systems to flush buffers. But what does one do when
we're the only system holding buffers? Deadlock still. OK, so
the network should hold some skbuffs in reserve for when somebody
needs a loan of some more just to free up other memory resources.
But what happens if they're loaned out and the net seizes? Deadlock.
Another workaround, of course, is to run on a P100 with 100BT
ethernet and dma.
I think Pavel Machek first discovered this kind of problem when he
tried to swap over the net. Yes, that's right, something had to be
swapped out in order to let the network swapper run. It probably
goes better nowadays, but I wouldn't bet on it.
Peter