[ENBD] Re: [DRBD-dev] A comparison of ENBD and DRBD, updated
Peter T. Breuer
ptb@it.uc3m.es
Sat, 21 Oct 2000 12:25:17 +0200 (MET DST)
"A month of sundays ago David Gould wrote:"
> On Sat, Oct 21, 2000 at 10:47:39AM +0900, Richard Sharpe wrote:
> > My setup is a Dual-celeron 533 as the source and an AMD K6-2/400 as the
> > sink, with a 100Mb/s ethernet between the machines (crossover cable).
>
> I would avoid using drbd with SMP in production, at least on the secondary,
> as I have experienced some hangs on the secondary. There are some new patches
> that may fix this, but until they are tested, your customer should use
> the nonSMP kernel. I would hope that we have better news in a few weeks.
We also may be both being hit by a TCP bug that is being discovered on the
rsync lists (kernel 2.2.17). But I would suspect use of a 2.4.0test kernel
is the problem?
> > Proto Recv-Q Send-Q Local Address Foreign Address State
> > tcp 35230 40686 expbuild.research.:8664 dynamic.ih.lucent:36352 ESTABLISHED
> >
> > and the solaris client says
> >
> > Local Address Remote Address Swind Send-Q Rwind Recv-Q State
> > dynamic.ih.lucent.com.36352 expbuild.research.bell-labs.com.8664 0 1459 8760 0 ESTABLISHED
>
> ok, if the above condition is not temporary (ie. not just packet loss)
> and the cable between the two boxes is OK then this is _definately_ a
> OS bug. The job of TCP is to get data from the sendq on one side to
> the recvq on the other. The only reason that data would not be sent on
> an ESTABLISHED connection is if the window was zero, and you don't get
> that with a zero sized recvq.
>
> It is quite impossible for rsync to cause the above condition. The
> rsync server has written some data to a socket in the expectation that
> it will get to the other end (that's what reliable transports are all
> about), but the data hasn't got there.
>
> The next thing you have to do is run a sniffer to determine whether it
> is a Solaris or Linux bug. My bet is this will be the same Linux bug
> we have observed here. You'll see the Linux box sending data outside
> the window that the Solaris box is offering, the Solaris box will
> reject that data by sending a ack with the current window and the
> Linux box will ignore the hint.
Peter