[ENBD] Bonnie on NBD w/ memory pressure deadlocks (problem in wait_for_tcp_memory?)
(fwd)
Peter T. Breuer
ptb@oboe.it.uc3m.es
Thu, 18 Jan 2001 14:09:54 +0100 (MET)
"A month of sundays ago Jeff Raubitschek wrote:"
> On Tue, 16 Jan 2001, Peter T. Breuer wrote:
> > I believe we understand the problem, thanks to some members of this
> > list. It is a VFS/VFS collision to localhost. There can be no fix
> > in nbd .. it happens outside of ndb in the VFS layers.
>
> Can you explain this? does this mean it will only occur when the server
> and client are on the same machine? I have been able to reproduce the
OK, I'll "expand" a bit, which is what I think you meant!
When the VFS is full, and the kernel exerts pressure on NBD to free
them, they will go out across tcp to the server, and, on ack, be freed.
Buttttttt ... if the server is on the same machine, then it must
write the blocks to disk before acking. Uh, uh, that is a write to VFS,
which cannot proceed, because VFS is full. Deadlock. Nobody can do much
about that until they rewrite memory handling in the kernel.
I believed that VFS/TCP also could collide, but it seems to have been
fixed in 2.2.18. There was a prioritization bug for memory allocation
in TCP before that.
I still believe that there is a possible deadlock of VFS/swap. If the
client is swapped out when VFS is filled, perhaps precisely because VFS
is full, then it must swap back in to service the send_request that
will release resources from VFS. That means VFS must send blocks to
swap to allow the client to swap in. It's very delicate.
To combat this sort of thing, current ENBDs (for values of current
between about 2.4.15 and 2.4.19) sync the block device every second or
every 10000 requests, or something like that, unless you turn it off.
You can do the same sort of thing by running sync() in the background.
I don't much like it .. but I haven't managed to find out how to tell
when VFS is nearly full, so that I can do a sync before it gets too full.
You should also mount the FSs on nbd as -o sync, if you plan on doing
large writes. That *should* prevent VFS from ever filling up, but as far
as I can tell, teher is a VFS bug that means that if you write from two
different processes to the same file, then only one of the writes is
synchronous to disk: the other goes to VFS at max velocity, where it
waits until the first write has finished and then empties into the
file after the first process has finished.
Needless to say, you can do a lot of badness with that trick.
> problem on 2.2.18 between too machines connected with gigabit ethernet.
>
> what is the needed fix in the VFS layers? it seems to me that NBD is
That is the $64000 question.
Reads are fine, of course.
Peter