[ENBD] 2.4.32 more weirdness
Peter T. Breuer
ptb at it.uc3m.es
Thu Mar 11 10:44:46 MST 2004
"Also sprach Anders Blomdell:"
>
> >> echo merge_requests=64 > /proc/nbdinfo
> >
> > OK.
> >
> >> echo sync=10 > /proc/nbdinfo
> >
> > I don't think that does anything nowadays (I may be mistaken).
> I got the impression that this made data leave VM and hit the intended
> recipient,
It appears to be completely vestigial in 2.4.32, as far as I can see.
It is there as a module parameter but there is no code other than
display code that looks at it.
> but they were not very far apart in time (a few minutes), so I could be
> mistaken.
>
> >> which increased speed 10-100 times for a short while (10 minutes), but the
> >
> > That argues that it did what it was supposed to. But I am afraid I
> > cannot say more at the moment without details of the setup, and output from
> > /proc/nbdinfo and the logs each side.
> Which is hard after a deadlock...
You cannot deadlock the rest of the machine through enbd. Honest - if you have problems
like that then it is a VMS deadlock or some such, i.e. you are out of memory. The
enbd module never takes any spinlock for anything more than a straight stretch of code.
So once it has a spinlock it must release it. Even if you run a preemptive
kernel, then prempt is never taken under spinlock, so you are still doomed to
release every lock you take.
I would leave the machine showing "watch cat /proc/nbdinfo" in a serial console, and
logging it.
> >> machine with HT still enabled
> >> (I turned it off in BIOS on the other two) behaved very sluggish. And then
> >> it deadlocked,
> >> (buffers again?) bringing one of the other machines with it.
> >
> > Is this the setup with three servers in a loop? And what is the kernel?
> Yes, 2.4.25
I really get the impression that pehaps 2.4.25 has borrowed the 2.6 scheduler. Can
you check the release notes on that, or get any otehr clues?
>
> >> The sluggishnes of the HT machine gives a hint that there is a resource
> >> contention problem
> >
> > No - it simply says that nobody has looked at the logs :).
> >
> > If the HT machine is slow through being HT, then there is competition
> > for computing resources. You want to look at "top".
> That is true, but if there is a resource problem, processes tend to step on
> each others
> toes. Come to think of it, it might be the "__set_current_state()" calls
There is only one use of that, just before the schedule_timeout call in
wait_for_completion. And the latter only is there to support remote
ioctls, which you are not using, so there is no call to it.
> (why not use
> "set_current_state()"?).
There is no good reason. I'll switch to it.
> But on the other hand schedule() ought to imply a
> barrier().
> Still curious of the choice of using "__set_current_state()" (and friends)
> ?
It doesn't matter - one can set the current task state in many ways. But if you
can find out what some process or thread is actually doing, then we would have
some chance of progressing :). The usual technique is to uncomment
printk "I am in " __FUNCTION__ calls at the front of every function, then
set back with a tail -f on the kernel log.
You can also turn on kernel profiling.
>
> >> somewhere (race conditions are generally worse on SMP machines).
> >
> > Not necessarily - I run with preempt on my portable and contention is
> > then worse with one processor than two, since there are fewer computing
> > resources.
> Contention is worse, but on SMP's global variables, etc are not always
> visible to the
> other processor (don't know about x86 HT, but have been badly bitten by
> Sparcs).
HT is just the same as SMP. And in the kernel, yes, globals will be visible
to the other processor.
> > If you really do have three machines in a loop, as I think you do, then
> > there is nothing to try. You can ONLY run async (client -a) and without
> > cache (server -n). Otherwise you will almost certainly deadlock through
> > VMS.
> > You can turn off VMS on the clientside too with direct=1 aimed at the
> > device through /proc/sys/dev/enbd or /proc/nbdinfo.
> >
> >
> > Otherwise you might try running "while sleep 1; do sync; done &"
> > on all machines, in order to keep buffers empty! But you really want to
> > sync only the disk resources.
> >
> > Nope - you need to get out the measuring instruments and scientifically
> > measure what is happening and where. I suspect that removing VMS as
> > directled and upping merge_requests and doing an async protocol is as
> > much as one can hope to do blind. Any more requires real looking!
> direct=1 gave this result:
>
> mke2fs 1.35-WIP (07-Dec-2003)
> Warning: could not erase sector 2: Attempt to write block from filesystem
mke2fs is not able to do aligned writes. Format it beforehand. I suspected that
that would be the case. You might also bug the mke2fs author about that.
But it's good to know that direct=1 works! If you are using applications
that do not do aligned writes, then using a raw device instead (/dev/rawX)
would be better (for mental health) than forcing O_DIRECT.
Peter
More information about the ENBD
mailing list