[ENBD] 2.4.32 conclusions
Peter T. Breuer
ptb at it.uc3m.es
Thu Mar 11 10:28:12 MST 2004
"Also sprach Anders Blomdell:"
>
> > If you really do have three machines in a loop, as I think you do, then
> > there is nothing to try. You can ONLY run async (client -a) and without
> > cache (server -n). Otherwise you will almost certainly deadlock through
> > VMS.
> > You can turn off VMS on the clientside too with direct=1 aimed at the
> > device through /proc/sys/dev/enbd or /proc/nbdinfo.
> >
> > Otherwise you might try running "while sleep 1; do sync; done &"
> > on all machines, in order to keep buffers empty! But you really want to
> > sync only the disk resources.
> Conclusion: servers in loop does not work (for me at least)!
You must run without VMS (at least on the serverside) if you are to
avoid VMS deadlock, simply because you write to a machine which tries to
write to disk, but which has to find buffers for that, so it flushes
stuff to the net to find them, but that causes the other machine to want
to write to disk, for which it needs buffers, which it gets by flushing
existing buffers to the net, which ...
Etc. One wants to be able to tell the algorithm that searches for buffers not
to try and evict buffers which when flushed will cause a demand for more buffers
via a second machine, but one can't. The only way out is not to use buffers. I.e.
open the resource O_DIRECT or use a raw device as the resource.
Maybe you don't understand "raw device"? That's one of the /dev/rawX.
You bind those devices to existing block devices using the raw device utilities,
then access the raw devices instead of the block device.
I would try using direct=1 too on the clientside. But then your applications have
to be prepared to open the nbd devices as O_DIRECT. They will have to do
aligned reads and writes to it.
It is to be hoped that alignment will go away in 2.6.
> > Nope - you need to get out the measuring instruments and scientifically
> > measure what is happening and where. I suspect that removing VMS as
> > directled and upping merge_requests and doing an async protocol is as
> > much as one can hope to do blind. Any more requires real looking!
> Which is far above my head.
I can't figure out what is happening without looking, but I would guess it
is VMS deadlock. The setup - three servers each mirroring to the two others
and all writing at once - sounds designed for it.
You could avoid it btw if you arranged that only one server writes at a time.
I can provide locking for that .. but I am still looking at various locking
problems in different libc and kernel combos. Today is kinda hectic.
> I'll have to gove for a 1 server, 2 standbys solution...
It is normal to have two servers each writing a mirror to the other server.
This does not hurt when both services are under light write pressure, but under
heavy pressure one must use O_DIRECT or raw resources in order to avoid
VMS cross deadlock. If VMS is not acting, it cannot deadlock.
Peter
More information about the ENBD
mailing list