[ENBD] 2.4.32

Peter T. Breuer ptb at it.uc3m.es
Fri Mar 5 13:07:03 MST 2004


"Also sprach Anders Blomdell:"
> >> I'll give explicit blocksize, remove '-e' and give '-p 60'
> >
> > Try that. I will also stop -e having an effect unless ALL daemons are
> > dead.
> 
> Now two of the servers are mostly hung (can't logon, but answers to ping

They aren't hung.  Whatever happens inside enbd does not affect the rest
of the kernel.  When requests are blocked then i/o _to that device_
ceases, but that is all.  If you find that you can't logon, then your
home is on the remote devices!  And you are seeing precisely what you
ought to see.

You can clear the blocked requests by logging on as root (whose home
will be local) and echoing 0 to /proc/nbdinfo. Or you can set a
special login which does exactly that as a shell!

But they will timeout in 60s if you set -p 60.

> and echoes linefeeds at console). This is what the remaining server says:
> 
> enbd-client  2071: client (-1) manager launched daemon 2 (2171) for server-01:30051
> enbd-client  2171: client (2) opens device /dev/ndc3

OK. Is this ndc? It looks like it.

> enbd-client  2071: client (-1) childminder launched pid 2171 (2)
> enbd-client  2170: client (1) opened socket 5 to server-01:30051
> enbd-client  2171: client (2) opened device /dev/ndc3 ok
> enbd-client  2071: client (-1) manager launched daemon 3 (2172) for server-01:30051

All fine, I suppose.

> enbd-client  2172: client (3) opens device /dev/ndc4
> enbd-client  2071: client (-1) childminder launched pid 2172 (3)
> enbd-client  2172: client (3) opened device /dev/ndc4 ok

OK.

> enbd-client  2167: client (3) opened socket 5 to server-02:30021
> enbd-client  2171: client (2) opened socket 5 to server-01:30051
> enbd-client  2172: client (3) opened socket 5 to server-01:30051
> enbd-client  2165: client (2) opened socket 5 to server-02:30021

> enbd-client  2067: sighandler relaunches child from manager
> enbd-client  2067: client (-1) reaped dead child 2157

Well, here we have a different client altogether! This is none of the
ones mentioned above.


> enbd-client  2067: client (-1) manager launched daemon 1 (2174) for server-01:30021
> enbd-client  2067: client (-1) childminder launched pid 2174 (1)
> enbd-client  2174: client (1) opens device /dev/nda2

Aha. It's for nda, not ndc.


> enbd-client  2174: client (1) opened device /dev/nda2 ok
> enbd-client  2174: client (1) opened socket 5 to server-01:30021
> enbd-client  2067: sighandler relaunches child from manager
> enbd-client  2067: client (-1) reaped dead child 2158
> enbd-client  2067: client (-1) manager launched daemon 2 (2175) for server-01:30021
> enbd-client  2067: client (-1) childminder launched pid 2175 (2)
> enbd-client  2175: client (2) opens device /dev/nda3
> enbd-client  2175: client (2) opened device /dev/nda3 ok
> enbd-client  2175: client (2) opened socket 5 to server-01:30021
> enbd-client  2073: <# 298> managersighandler received signal 17

All looks OK, but I really can't tell which is which! 17 is SIGCHLD
so a child died.


> enbd-client  2073: sighandler relaunches child from manager
> enbd-client  2073: client (-1) reaped dead child 2160

and surely we have not seen that one before? How many of tehse things
are there?


> enbd-client  2073: client (-1) reaped dead child 2159
> enbd-client  2073: client (-1) manager launched daemon 1 (2176) for 
> server-02:30051
> enbd-client  2073: client (-1) childminder launched pid 2176 (1)
> enbd-client  2073: client (-1) manager launched daemon 3 (2177) for 
> server-02:30051
> enbd-client  2073: client (-1) childminder launched pid 2177 (3)
> enbd-client  2176: client (1) opens device /dev/ndd2
> enbd-client  2177: client (3) opens device /dev/ndd4
> enbd-client  2176: client (1) opened device /dev/ndd2 ok
> enbd-client  2177: client (3) opened device /dev/ndd4 ok
> enbd-client  2176: client (1) opened socket 5 to server-02:30051
> enbd-client  2177: client (3) opened socket 5 to server-02:30051

This all looks fine.

> Would tcpdumps be of any help?

No - the above all looks fine, apart from the fact that some daemons
keep dying, which strongly suggests that the other end is not
responding. What did you do exactly? What does the other end say?

And I am also puzzled about this whole setup. Would you mind
describing it? I have something of the feeling that there is aloop
here.



Peter



More information about the ENBD mailing list