[ENBD] New to ENBD, having troubles
Christopher Eveland
enbd@lists.community.tummy.com
Tue, 22 Jan 2002 17:41:23 -0500
Hi all-
Sorry for the long post, but I'm trying to get enbd set up and running on
some machines I have, and can't quite figure out whats going wrong. I can
build, and do the basic make test, but I seem to hang the machine (or hose
it anyway) when I try to do more complex things like mkfs.
For background, I have 2 dual PIII 800's trying to talk to eachother. The
only thing I can think of that might be "odd" about the machines is that I
have them talking to eathother over a bonded dual 100Mb ethernet. Other
than that its a debian machine:
beowulf1:~/nbd-2.4.27> uname -a
Linux beowulf1 2.4.16 #4 SMP Fri Dec 21 16:18:42 EST 2001 i686 unknown
I'm running version 2.4.27, I also tried 2.4.26. I set the SMP flag in the
makefile to match the kernel.
Anyway, after doing the make, I do the make test, and go through the
checklist. The module is loaded, I can see the server and client procs on
the machines, and when I cat nbdinfo, I get:
beowulf1:~> cat /proc/nbdinfo
Device a-e: Closed
Device f: Open
[f] State: verify, rw, enabled, plug, last error 0, lives 0
[f] Queued: +0R/0W curr (check 0R/0W) +0R/0W max
[f] Buffersize: 262144 (sectors=512, blocks=256)
[f] Blocksize: 1024 (log=10)
[f] Size: 8192KB
[f] Blocks: 8192
[f] Sockets: 4 (+) (+) (+) (*)
[f] Requested: 0 (0) (0) (0) (0) 0R/0W max 0
[f] Despatched: 0 (0) (0) (0) (0) 0R/0W md5 0W (0 eq, 0 ne, 0 dn)
[f] Errored: 0 (0) (0) (0) (0) 0+0
[f] Pending: 0 (0) (0) (0) (0) 0R/0W+0R/0W
[f] B/s now: 0 (0R+0W)
[f] B/s ave: 0 (0R+0W)
[f] B/s max: 0 (0R+0W)
[f] Spectrum:
[f] Kthreads: 0 (0 waiting/0 running/0 max)
[f] Cthreads: 4 (+) (+) (+) (+)
[f] Cpids: 4 (1177) (1178) (1179) (1180)
Device g-p: Closed
beowulf1:~>
This seems to be about right as far as I can tell.
I can even do some small things with the device (I'm using /dev/ndf if this
case, since I had set up the others to auto mount on boot, obviously getting
ahead of myself... anyway, a-e are all turned off), like use dd to copy 512
bytes onto a device (such as /dev/hda3) and then compare to the original 512
bytes: they match. But if I try to do something "big", like mkfs, it seems
to hang up.
For instance, after doing "make test", I try "mke2fs /dev/ndf" as per the
online instructions. As soon as I try mke2fs, I get the following on the
console:
nbd-server 1316: slavesighandler server (0) activates slave sighandler for
signal 11
nbd-server 1318: slavesighandler server (2) activates slave sighandler for
signal 11
nbd-server 1318: server (2) sighandler terminates slave 1318 safely
nbd-server 1316: server (0) sighandler terminates slave 1316 safely
nbd-server 1315: server (-1) relaunches child after SIGCHLD
nbd-server 1315: server (-1) slave pid 1318 is down, launching new
nbd-server 1324: server (2) set default signal handlers for slave server
1324
nbd-server 1315: server (-1) launched slave pid 1324
nbd-server 1315: server (-1) slave pid 1316 is down, launching new
nbd-server 1325: server (0) set default signal handlers for slave server
1325
nbd-server 1315: server (-1) launched slave pid 1325
At which point I get the following nbdinfo
beowulf1:~/nbd-2.4.27> cat /proc/nbdinfo
Device a-e: Closed
Device f: Open
[f] State: verify, rw, enabled, plug, last error 0, lives 0
[f] Queued: +0R/0W curr (check 0R/58W) +1R/63W max
[f] Buffersize: 262144 (sectors=512, blocks=256)
[f] Blocksize: 1024 (log=10)
[f] Size: 8192KB
[f] Blocks: 8192
[f] Sockets: 4 (+) (+) (*) (+)
[f] Requested: 65 (3) (1) (2) (1) 1R/64W max 1
[f] Despatched: 7 (3) (1) (2) (1) 1R/6W md5 0W (0 eq, 0 ne, 0 dn)
[f] Errored: 0 (0) (0) (0) (0) 0+0
[f] Pending: 0 (0) (0) (0) (0) 0R/0W+0R/58W
[f] B/s now: 0 (0R+0W)
[f] B/s ave: 4.00K (0R+4.00KW)
[f] B/s max: 39.0K (0R+38.0KW)
[f] Spectrum: 100%1
[f] Kthreads: 0 (0 waiting/0 running/1 max)
[f] Cthreads: 0 (-) (-) (-) (-)
[f] Cpids: 0 (1177) (1178) (1179) (1180)
Device g-p: Closed
beowulf1:~/nbd-2.4.27>
This seems bad, by my interpretation of the last few lines meaning I have no
threads left. And then after a few seconds I get on the console:
nbd-netserver 1178: recv_reply client (1) net_recv_reply reports bad magic
0x0 instead of 0x67446698 with error 0x0 handle 0xbffff1ec flags 134544497
cmd 1 len 0 sector 4294967295
nbd-netserver 1180: recv_reply client (3) net_recv_reply reports bad magic
0x0 instead of 0x67446698 with error 0x0 handle 0xbffff1ec flags 134544497
cmd 1 len 0 sector 4294967295
nbd-netserver 1178: recv_reply client (1) net_recv_reply reports addresses
magic 0xbffff31c error 0xbffff320 handle 0xbffff324 flags 0xbffff328
nbd-netserver 1180: recv_reply client (3) net_recv_reply reports addresses
magic 0xbffff31c error 0xbffff320 handle 0xbffff324 flags 0xbffff328
nbd-server 1317: newproto Not enough magic in packet. Breaking off.
nbd-server 1319: newproto Not enough magic in packet. Breaking off.
nbd-server 1317: slavesighandler server (1) activates slave sighandler for
signal 15
nbd-server 1317: server (1) sighandler terminates slave 1317 safely
nbd-server 1319: slavesighandler server (3) activates slave sighandler for
signal 15
nbd-server 1319: server (3) sighandler terminates slave 1319 safely
nbd-server 1315: server (-1) relaunches child after SIGCHLD
nbd-server 1315: server (-1) slave pid 1317 is down, launching new
nbd-server 1326: server (1) set default signal handlers for slave server
1326
nbd-server 1315: server (-1) launched slave pid 1326
nbd-server 1315: server (-1) slave pid 1319 is down, launching new
nbd-server 1327: server (3) set default signal handlers for slave server
1327
nbd-server 1315: server (-1) launched slave pid 1327
nbd-server 1315: server (-1) relaunches child after SIGCHLD
And at this point my load shoots up to about 3 or 4, and while I can
interact somewhat with the machine, I can't seem to shut it down nicely, to
get the load to go back down, I have to pretty much reset the machine.
So I'm having trouble interpreting this. If anyone has some suggestions, or
can point me to something to look at, I'd appreciate it. Thanks,
-Chris