[ENBD] Re: [DRBD-dev] A comparison of ENBD and DRBD, updated

Peter T. Breuer ptb@it.uc3m.es
Sat, 21 Oct 2000 12:25:17 +0200 (MET DST)


"A month of sundays ago David Gould wrote:"
> On Sat, Oct 21, 2000 at 10:47:39AM +0900, Richard Sharpe wrote:
> > My setup is a Dual-celeron 533 as the source and an AMD K6-2/400 as the
> > sink, with a 100Mb/s ethernet between the machines (crossover cable).
> 
> I would avoid using drbd with SMP in production, at least on the secondary,
> as I have experienced some hangs on the secondary. There are some new patches
> that may  fix this, but until they are tested, your customer should use
> the nonSMP kernel. I would hope that we have better news in a few weeks.

We also may be both being hit by a TCP bug that is being discovered on the
rsync lists (kernel 2.2.17). But I would suspect use of a 2.4.0test kernel
is the problem?

 > >     Proto Recv-Q Send-Q Local Address           Foreign Address State
 > >     tcp    35230  40686 expbuild.research.:8664 dynamic.ih.lucent:36352 ESTABLISHED
 > >
 > > and the solaris client says
 > >
 > >    Local Address               Remote Address Swind Send-Q Rwind Recv-Q   State
 > >    dynamic.ih.lucent.com.36352 expbuild.research.bell-labs.com.8664     0 1459  8760      0 ESTABLISHED
 >
 > ok, if the above condition is not temporary (ie. not just packet loss)
 > and the cable between the two boxes is OK then this is _definately_ a
 > OS bug. The job of TCP is to get data from the sendq on one side to
 > the recvq on the other. The only reason that data would not be sent on
 > an ESTABLISHED connection is if the window was zero, and you don't get
 > that with a zero sized recvq.
 >
 > It is quite impossible for rsync to cause the above condition. The
 > rsync server has written some data to a socket in the expectation that
 > it will get to the other end (that's what reliable transports are all
 > about), but the data hasn't got there.
 >
 > The next thing you have to do is run a sniffer to determine whether it
 > is a Solaris or Linux bug. My bet is this will be the same Linux bug
 > we have observed here. You'll see the Linux box sending data outside
 > the window that the Solaris box is offering, the Solaris box will
 > reject that data by sending a ack with the current window and the
 > Linux box will ignore the hint.



Peter