[ENBD] 20G dd hang up!
Peter T. Breuer
enbd@lists.community.tummy.com
Fri, 11 Jan 2002 21:54:35 +0100 (MET)
A bit more ...
"ptb wrote:"
> "Kuniyasu SUZAKI wrote:"
> OK. 2.4.26 or 2.4.26a (actually, the difference is only in the kernel
> driver)?
Yes.
> Personally, I think you're stuck with some kind of unknowable memory
> deadlock under 2.2.18 (between tcp and other resources). I am most
Or some i/o deadlock.
> surprised ... tell me, can you slow down the processor on your TP?
>
> This is my theory...
>
> I rather suspect that what happens is that the processor throws stuff
> at the network faster than it can handle, causing VM buffers to back up
> and fill available memeory, until they collide with expanding tcp
> stacks from the network bottleneck at 10MB/s. At that point, to relieve
> pressure the VM has to push those buffers out through nbd to the
> network, which backs up tcp buffers. Bang!!!!!
The deadlock I described isn't quite right, because people who see this
deadlock seem to report that buffers are not occupying all of the
memory. Nevertheless, there is _some_ resource that is contended for
between VM buffers and tcp stacks in 2.2.18.
We know that because it is networking (the client, some user space
stuff, ..) that dies, not the kernel driver. The driver just sits there
wondering why the kernel isn't giving it anything to do and why the
client daemon isn't responding like it ought to.
What could the deadlocking resource be? The i/o spinlock? How? The
evidence is that no new requests ever arrive at nbd to be treated.
What can the kernel be doing? I suspect it's running up and down lists
of buffers crazily looking for one it can get rid of because tcp
stacks want more memory. Why doesn't it push some buffer to swap? Swap
must be nearly empty in these cases, surely? Anyway, tcp wants
stack memory, and the kernel won't release buffers, I think. Why? It
works in 2.4!
In 2.2.17 a tcp/buffer contention was found, and was changed so that
tcp always wins. In theory this is the correct thing to do, but it
doesn't help when tcp is the second guest to the party. Then the
resource is already taken.
If this theory - vague contention between tcp and VM buffers is
correct - then there are several things that one could do which should
in theory release the pressure.
1) the driver can, instead of rolling back requests after 5s
untreated, it can error them. After all, some requests treated are
better than none. Erroring is a valid tactic. The sender can retry
if he sees a media error. To force this behaviour, I think all you
have to do is set "show_errs" flag in the module:
// PTB error too old reqs if show_errs is set, else roll them back
if (!(atomic_read(&lo->flags) & NBD_SHOW_ERRS) || lo->aslot > 0) {
NBD_DEBUG (1, "rollback old on device nd%s\n", lo->devnam);
nbd_rollback_old (lo);
} else {
NBD_DEBUG (1, "error old on device nd%s\n", lo->devnam);
nbd_error_old(lo);
}
and you may want to remove that "|| lo->aslot > 0" !! In fact I may.
It prevents requests being errored if any of the daemons are still
alive because they may still do the work for us in a moment, but in
this case they are alive but useless. Take it away (in nbd.c).
You can set the show_errs flag by
echo show_errs=1 > /proc/nbdinfo
(amongst other things). Tell me if this avoids the deadlock you
experience. Then we can take it from there.
2) you may want to avoid using VM buffers altogether. This is easy in
the 2.4 kernels, I believe, because all you have to do is
/sbin/raw /dev/raw1 /dev/nda
(for example). At least I THINK that is all you have to do. If
somebody can confirm, that would be good.
In the 2.2 kernels, you may have to apply a patch to get raw i/o.
Then you will have to figure out how to make it work! At least the
oracle people know how to do that. I think the patch exists, and it
is probably worth searching for it.
No buffer use implies no memory contention.
3) you may want to change the stream.c code in the nbd directory to
use UDP instead of TCP, which would avoid TCP buffer growth.
That's all I can think of for the moment. I would of course be grateful
if you could identify the cause of your lockup more precisely than I
have been able to indicate above.
Personally, I'd try a 2.4 kernel and see if that fixes it. It would
confirm the hypothesis that the 2.2 kernel is the problem, because the
driver code would be the same still.
Peter