[ENBD] Re: NBD Observations
Peter T. Breuer
ptb@it.uc3m.es
Sat, 9 Sep 2000 18:15:47 +0200 (MET DST)
"A month of sundays ago Paul Flinders wrote:"
> "Peter T. Breuer" wrote:
> > > command lines were
> > > server: nbd-server 4017 /dev/hda3 -b 2048 -i nbd
> > > client: nbd-client -i nbd rat-1 4017 rat-1 /dev/nda
> > And kernel state? Was the nbd.o module freshly inserted?
> Hmmm, no - I think that it probably wasn't
> > > I guess that the signature might be something to do with it - have I
> > > got the command lines correct?
> > I suspect that the kernel had a mosule in with a signature already on
> > it, possibly different. Sending the same signature should be OK.
> I suspect that you guess is correct. This does lead to another "it's all
> too easy to get it wrong" scenarion though.
In this case the result is intended. You should not be able to connect
up a different resource to the same client device without making a
considerable effort! It's the same as hotswapping one disk for another
(different geometry maybe, different contents) without telling anybody.
I'll have to do the same thing in reverse too, pretty soon. The server
should not accept reconnects that don't give the device signature. It
probably accepts any client on the right port right now.
I've been following up looking at the client-first then server launch
ordering. Though my pair connected up fine, some funny business was
clearly going on. All requests errored out as overrange and the sizes
shown in the proc listing were the default maxposint stuff.
This is probably where your suspicion about the sig comes from: watching
the negotiation, it was clear that the client got a SIZE where it was
expecting a SIGN. For backwards compatibility it accepts data without a
preceding command when pushed, and it accepted it. Making the code
stricter simply errored out the introductory negotiation every time.
It wasn't the server getting it wrong. I'd forgotten to loop the
introduction stuff in the client when it failed. The introduction to
the server failed, and the client went on to launch slave clients as
though nothing had happened, because I'd also forgotton to watch the
error return.
So, in your 2.4.X nbd-client.c main() code, replace the plain call to
introduction(self), with
while (1) {
static int count;
err = introduction(self);
if (err >= 0)
break;
if (10 * count++ > 2 * negotiate_timeout) {
PERR("client (%d) manager gave up on intro after %d tries and %ds\n",
self->i, count, count*10);
exit(1);
}
microsleep(10000000);
}
and you will have a lot more success starting out of order.
Sorry, it was a bug, but it is the development code. I'd simply
forgotten to implement that functionality.
I've changed it in my 2.4.12 code. If I've not told you, you
can see the whole codebase at http://www.it.uc3m.es/~ptb/nbd/src/
without waiting for a tarfile of anything in particular.
Peter