[ENBD] 2.4.32
Peter T. Breuer
ptb at it.uc3m.es
Thu Mar 11 08:45:07 MST 2004
"Also sprach Anders Blomdell:"
> > "Also sprach Anders Blomdell:"
> >> Client 1:
> >>
> >> 11:45:41 _llseek(3, 6041313280, [6041313280], SEEK_SET) = 0
> >> 11:45:41 write(3, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\
> >> 0"
> >> ..., 32768) = 32768
> >> 11:46:23 _llseek(3, 6041346048, [6041346048], SEEK_SET) = 0
> >> 11:46:23 write(3, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\
> >> 0"
> >> ..., 32768) = 32768
> >
> > Uh, if it is doing an lseek, then it's surely a server, not a client?
> > The clients talk to the nbd devices via ioctls.
> Sorry, im being unclear, it's the "mkfs.ext3 /dev/nda".
> >> Server 3 (top):
> >> PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
> >> 2172 root 10 0 2060 2060 1612 S 45.0 0.2 12:05.35 enbd-server
> >> 2163 root 9 0 2060 2060 1612 S 41.2 0.2 11:46.30 enbd-server
> >> 2162 root 11 0 2060 2060 1612 S 28.3 0.2 10:09.98 enbd-server
> >> 2173 root 9 0 2060 2060 1612 S 21.9 0.2 10:49.02 enbd-server
> >>
> >> All these servers are doing a select (that eventually times out), like
> >> this:
> >
> > Yes, I have seen this too. It seems to be a new 2.6 thing. I'll try and
> > figure out what it is.
> 2.4.25
Then I haven't seen it, but if 2.4.25 has borrowed the 2.6 scheduler,
perhaps it is the same thing. I haven't seen 2.4.25 at all. Increase
merge_requests and see if it goes away.
> > If as you say those servers are just sitting there for 5s waiting on a
> > byte to appear in a network socket, then it is the kernels/glibc's fault
> > if that is now implemented in such a way as to take cpu cycles.
> Setting TASK_INTERRUPTIBLE before schedule_timeout moved the problem (now
> th fsync'ing server process(es) reports most of the load).
It can't change it, since the routine in which it was is not called!
It was in the routine clr_kernel_queue, which is only called after a
device has broken down completely: and anyway, it would only scan the
kernel queue erroring requests and finish in one go, instead of doing it
in 1000 jiffies, one entry per jiffy. Think again about it. You don't
see that function in action.
What fsync?
> > No. Nothing like that. For a start you are looking at a server, not a
> > client!
> NO.
??
> >> Is /dev/hdaX raw device or is it only /dev/hda, obviously /dev/hdaX
> >> deadlocks.
> >
> > I am not sure what you mean ... oh, the -n flag on the server? It's
> > labelled as experimental, and last I heard Arne was trying to figure
> > out under what conditions one can write data under O_DIRECT ordinarily.
> > You will have to set "-b 4096" also on the server, in order to
> > force writes to be aligned to a page, which we are fairly confident is
> > OK under O_DIRECT. With -n the serve will do its opens with O_DIRECT.
> > With -n and -b 4096, its writes will be to a fd opened O_DIRECT and
> > will be written from buffers aligned to 4096.
> >
> > Check the code in file.c if in doubt.
> >
> > /dev/hda and /dev/hdaX are none of them raw devices. You used to be
> > able to bind tehm to raw devices, but I think that is now deprecated
> > in 2.6 in favour of the O_DIRECT trick. I did have a trick in the
> > kernel code which caused all opens on the enbd device to become
> > O_DIRECT ... yes, you can set direct=1 as a module parameter, or
> > do it device by device via the /proc/sys/dev/enbd interface. I presume
> > one can also write something like "direct[a]=1" to /proc/nbdinfo.
> >
> > Be careful to distinguish client from server. I was talking about
> > serverside options. But yes, you can also remove buffering clientside.
> >
> > There used also to be a raw interface for enbd .. the enbd_raw.o
> > module? Is that only in 2.4 codes?
> I'll have to read the above a few times before I understand all the
> implications...
If you are talking about 2.4 kernels, then I do not know how fully
O_DIRECT (on the serverside) is supported. Ditto clientside. But you
can always bind hda[X] to a raw device (but that's at the serverside,
no?).
Peter
More information about the ENBD
mailing list