[ENBD] nbd with an SMP kernel?
Peter T. Breuer
ptb@it.uc3m.es
Tue, 17 Oct 2000 21:22:44 +0200 (MET DST)
"A month of sundays ago leonid@hmdc-admin.fas.harvard.edu wrote:"
> > so there is no hope of a NEWLY started server sharing in the existing
> > session secret.
> >
> > They make one up for the session if you don't supply it explicitly ..
> > you wouldn't like any old thing connecting to your resources, would
> > you?
>
> I actually used to run the server w/ -i "NBDabcdefNBD" (I stole this from
The error you showed was "no server". If there is a problem,
it isn't the client, because it noticed the error and acted on it.
I strongly suspect that it is fine later too, but I need to see its
debugging output to get an idea.
> your Makefile), but I was getting the same result. I then tried to run the
> server w/out the signature specified, to see if that would change
> things. Bad idea, apparently.
> > But the error you showed was "no server".
>
> Yes, and that's what's puzzling me. The nbd-server IS actually running --
> I restarted it after I killed the old one. It just won't reconnect.
Then the error is not the same error you are seeing reported. Unless you
compile with DEBUG=1 messages are not generated for internal changes
of state (such as going from waiting for a connect to disconnecting
and trying again) except in exceptional circumstances. But in any case
the error reported is definitive. At the moment that message was
generated the server socket was not in the listen state (or simply not
there). What happened afterwards will be told by the log, when you
compile with DEBUG=1.
The server and client _can_ get stuck, at least for long whiles. In
particular there is a low level timeout of about 360s in each
send/recv from the net. And sometimes the same error state will be
revisited 3 times before being treated as catastrophic, leading
to a restart, which in this case you want. If you _know_ that you want
a restart, just shortcircuit the procedure and kill the server and
client slaves. You can do this automatically with heartbeat. Then
they'll restart. Left to themselves they'll have to guess at what is
going on, and do it fairly conservatively.
That said, I'm not happy with the signalling behaviours under linux and
unix in general. The only way to get out of jail is to read the
signals. but the signal handlers can't do anything. Even a malloc in a
siglalled interrupt is dangerous. In particular, send/recv can hang
half way through a transmission if you time it just right. I can't do
anything about it. They don't listen to any signal except -9. And
I can't impose more than one timeout, because it's horribly complicated
(stacked clocks). So the timeout has to be generous.
Peter