[ENBD] Stress testing ENBD

Christopher Eveland enbd@lists.community.tummy.com
Thu, 24 Jan 2002 16:47:14 -0500


Hi again-

I'm now to the point of trying to stress test ENBD.  I can get everything
talking between multiple servers and clients and such just fine, but I'm
having some trouble when I put a connection through its paces.

At the moment I'm using the "new" 2.4.27pre1.  If I do a "make test" which
sets up an 8MB set of files on the server, and hook that up to /dev/nda on
the client, then:

# mke2fs /dev/nda
# mount /dev/nda /mnt
# touch /mnt/foo
# dd if=/dev/zero of=/mnt/foo bs=1024 count=7000
# sync

things seem to work just fine, and I can watch the network traffic spike up
when I do the sync.  The problem comes when I try to get more ambitious:

# for ((i=1;i<=10;i++)); do dd if=/dev/zero of=/mnt/foo bs=1024 count=7000;
sync; done

will crash the system (well, apparently stop the transfers at any rate)
after a few iterations.  I see the net traffic spike up, then I get a string
of errors, at which point I see the following string of message on the
console (client and server messages are interleaved), and the net traffic
drops of to only 4kb/sec, which I assume is the attempts to re-establish
connections:

nbd-server  4473: slavesighandler server (2) activates slave sighandler for
signal 11
nbd-server  4473: server (2) sighandler terminates slave 4473 safely
nbd-netserver  1543: net_sum recv_reply CKSUM failed (-22)
nbd-netserver  1543: net_sum sum exits FAIL (-22)
nbd-client-server  1543: sumgen sumgen exits FAIL (-22)
nbd-server  4470: server (-1) relaunches child after SIGCHLD
nbd-netserver  1543: net_write net_write exits FAIL -22
nbd-client-server  1543: writegen write first server fails sector 4152 len
34816nbd-client-server  1543: writegen exits FAIL
nbd-client  1543: handle_server_cmd_err Read_stat (3) failed Timer expired
port 3034 so clr sock
nbd-server  4470: server (-1) slave pid 4473 is down, launching new
nbd-server  4475: server (2) set default signal handlers for slave server
4475
nbd-server  4470: server (-1) launched slave pid 4475
nbd-server  4474: slavesighandler server (3) activates slave sighandler for
signal 11
nbd-server  4474: server (3) sighandler terminates slave 4474 safely
nbd-netserver  1540: net_sum recv_reply CKSUM failed (-22)
nbd-netserver  1540: net_sum sum exits FAIL (-22)
nbd-client-server  1540: sumgen sumgen exits FAIL (-22)
nbd-netserver  1540: net_write net_write exits FAIL -22
nbd-client-server  1540: writegen write first server fails sector 4220 len
34816nbd-client-server  1540: writegen exits FAIL
nbd-client  1540: handle_server_cmd_err Read_stat (0) failed Timer expired
port 3034 so clr sock
nbd-server  4470: server (-1) relaunches child after SIGCHLD
nbd-server  4470: server (-1) slave pid 4474 is down, launching new
nbd-server  4476: server (3) set default signal handlers for slave server
4476
nbd-server  4471: slavesighandler server (0) activates slave sighandler for
signal 11
nbd-server  4471: server (0) sighandler terminates slave 4471 safely
nbd-netserver  1541: net_sum recv_reply CKSUM failed (-22)
nbd-netserver  1541: net_sum sum exits FAIL (-22)
nbd-client-server  1541: sumgen sumgen exits FAIL (-22)
nbd-server  4470: server (-1) launched slave pid 4476
nbd-server  4470: server (-1) slave pid 4471 is down, launching new
nbd-server  4472: slavesighandler server (1) activates slave sighandler for
signal 11
nbd-server  4472: server (1) sighandler terminates slave 4472 safely
nbd-server  4477: server (0) set default signal handlers for slave server
4477
nbd-server  4470: server (-1) launched slave pid 4477
nbd-server  4470: server (-1) slave pid 4472 is down, launching new
nbd-server  4478: server (1) set default signal handlers for slave server
4478
nbd-server  4470: server (-1) launched slave pid 4478
nbd-server  4470: server (-1) relaunches child after SIGCHLD
nbd-netserver  1542: net_write net_write exits FAIL -22
nbd-client-server  1542: writegen write first server fails sector 2 len 1024
nbd-client-server  1542: writegen exits FAIL
nbd-client  1542: handle_server_cmd_err Read_stat (2) failed Connection
reset by peer port 3034 so clr sock
nbd-client  1543: client (3) last error Timer expired
nbd-client  1540: client (0) last error Timer expired
nbd-client  1536: sighandler relaunches child from manager
nbd-client  1536: client (-1) childminder verifies pid 1543 (3) is down
nbd-client  1536: client (-1) childminder launched pid 1593 (3)
nbd-client  1593: ok
nbd-client  1536: client (-1) childminder verifies pid 1540 (0) is down
nbd-client  1594: ok
nbd-client  1536: client (-1) childminder launched pid 1594 (0)
nbd-client  1593: client (3) opened socket 5 to port 3034
nbd-server  4477:nbd-client  1594: client (0) opened socket 5 to port 3034
nbd-server  4475: server (2) opened port 3034 (socket 6) for client
192.168.4.1
nbd-client  1594: client (0) read passwd ok from port 3034
nbd-client  1594: client (0) got cliserv magic ok from port 3034
nbd-client  1594: client (0) got a signature ok from port 3034
nbd-client  1594: client (0) begins main loop
nbd-server  4475: server (2) set new signal handlers for slave server 4475
 server (0) opened port 3034 (socket 6) for client 192.168.4.1
nbd-client  1593: client (3) read passwd ok from port 3034
nbd-client  1593: client (3) got cliserv magic ok from port 3034
nbd-client  1593: client (3) got a signature ok from port 3034
nbd-client  1593: client (3) begins main loop
nbd-server  4477: server (0) set new signal handlers for slave server 4477
nbd-server  4477: slavesighandler server (0) activates slave sighandler for
signal 11
nbd-server  4477: server (0) sighandler terminates slave 4477 safely
nbd-server  4470: server (-1) relaunches child after SIGCHLD
nbd-server  4470: server (-1) slave pid 4477 is down, launching new
nbd-server  4479: server (0) set default signal handlers for slave server
4479
nbd-server  4470: server (-1) launched slave pid 4479
nbd-client  1542: client (2) last error Connection reset by peer
nbd-client  1536: sighandler relaunches child from manager
nbd-client  1536: client (-1) childminder verifies pid 1542 (2) is down
nbd-client  1536: client (-1) childminder launched pid 1595 (2)
nbd-client  1595: ok
nbd-client  1595: client (2) opened socket 5 to port 3034
nbd-server  4479: server (0) opened port 3034 (socket 6) for client
192.168.4.1
nbd-client  1595: client (2) read passwd ok from port 3034
nbd-client  1595: client (2) got cliserv magic ok from port 3034
nbd-client  1595: client (2) got a signature ok from port 3034
nbd-client  1595: client (2) begins main loop
nbd-server  4479: server (0) set new signal handlers for slave server 4479
nbd-netserver  1593: net_write net_write exits FAIL -22
nbd-client-server  1593: writegen write first server fails sector 2 len 1024
nbd-client-server  1593: writegen exits FAIL
nbd-client  1593: handle_server_cmd_err Read_stat (3) failed Connection
reset by peer port 3034 so clr sock
nbd-server  4475: slavesighandler server (2) activates slave sighandler for
signal 11
nbd-server  4475: server (2) sighandler terminates slave 4475 safely
nbd-server  4470: server (-1) relaunches child after SIGCHLD
nbd-server  4470: server (-1) slave pid 4475 is down, launching new
nbd-server  4480: server (2) set default signal handlers for slave server
4480
nbd-server  4470: server (-1) launched slave pid 4480
nbd-netserver  1594: net_write net_write exits FAIL -22
nbd-client-server  1594: writegen write first server fails sector 2 len 1024
nbd-client-server  1594: writegen exits FAIL
nbd-client  1594: handle_server_cmd_err Read_stat (0) failed Connection
reset by peer port 3034 so clr sock
nbd-server  4479: slavesighandler server (0) activates slave sighandler for
signal 11
nbd-server  4479: server (0) sighandler terminates slave 4479 safely
nbd-server  4470: server (-1) relaunches child after SIGCHLD
nbd-server  4470: server (-1) slave pid 4479 is down, launching new
nbd-server  4481: server (0) set default signal handlers for slave server
4481
nbd-server  4470: server (-1) launched slave pid 4481
nbd-netserver  1595: net_write net_write exits FAIL -22
nbd-client-server  1595: writegen write first server fails sector 2 len 1024
nbd-client-server  1595: writegen exits FAIL
nbd-client  1595: handle_server_cmd_err Read_stat (2) failed Connection
reset by peer port 3034 so clr sock
nbd-client  1593: client (3) last error Connection reset by peer
nbd-client  1536: sighandler relaunches child from manager
nbd-client  1536: client (-1) childminder verifies pid 1593 (3) is down
nbd-client  1596: ok
nbd-client  1536: client (-1) childminder launched pid 1596 (3)
nbd-client  1596: client (3) opened socket 5 to port 3034
nbd-server  4476: server (3) opened port 3034 (socket 6) for client
192.168.4.1
nbd-client  1596: client (3) read passwd ok from port 3034
nbd-client  1596: client (3) got cliserv magic ok from port 3034
nbd-client  1596: client (3) got a signature ok from port 3034
nbd-client  1596: client (3) begins main loop
nbd-server  4476: server (3) set new signal handlers for slave server 4476
nbd-server  4476: slavesighandler server (3) activates slave sighandler for
signal 11
nbd-server  4476: server (3) sighandler terminates slave 4476 safely
nbd-server  4470: server (-1) relaunches child after SIGCHLD
nbd-server  4470: server (-1) slave pid 4476 is down, launching new
nbd-client  1594: client (0) last error Connection reset by peer
nbd-server  4482: server (3) set default signal handlers for slave server
4482
nbd-server  4470: server (-1) launched slave pid 4482
nbd-client  1536: sighandler relaunches child from manager
nbd-netserver  1596: net_write net_write exits FAIL -22
nbd-client  1536: client (-1) childminder verifies pid 1594 (0) is down
nbd-client-server  1596: writegen write first server fails sector 2 len 1024
nbd-client-server  1596: writegen exits FAIL
nbd-client  1597:nbd-client  1596: handle_server_cmd_err Read_stat (3)
failed Connection reset by peer port 3034 so clr sock
 ok
nbd-client  1536: client (-1) childminder launched pid 1597 (0)
nbd-client  1597: client (0) opened socket 5 to port 3034
nbd-server  4480: server (2) opened port 3034 (socket 6) for client
192.168.4.1
nbd-client  1597: client (0) read passwd ok from port 3034
nbd-client  1597: client (0) got cliserv magic ok from port 3034
nbd-client  1597: client (0) got a signature ok from port 3034
nbd-client  1597: client (0) begins main loop
nbd-server  4480: server (2) set new signal handlers for slave server 4480
nbd-server  4480: slavesighandler server (2) activates slave sighandler for
signal 11
nbd-server  4480: server (2) sighandler terminates slave 4480 safely
nbd-client  1595: client (2) last error Connection reset by peer
nbd-server  4470: server (-1) relaunches child after SIGCHLD
nbd-server  4470: server (-1) slave pid 4480 is down, launching new
nbd-server  4483: server (2) set default signal handlers for slave server
4483
nbd-server  4470: server (-1) launched slave pid 4483
nbd-client  1536: sighandler relaunches child from manager
nbd-client  1536: client (-1) childminder verifies pid 1595 (2) is down
nbd-netserver  1597: net_write net_write exits FAIL -22
nbd-client-server  1597: writegen write first server fails sector 12 len
1024
nbd-client-server  1597: writegen exits FAIL
nbd-client  1598: ok
nbd-client  1597: handle_server_cmd_err Read_stat (0) failed Connection
reset by peer port 3034 so clr sock
nbd-client  1536: client (-1) childminder launched pid 1598 (2)
nbd-client  1598: client (2) opened socket 5 to port 3034
nbd-server  4483: server (2) opened port 3034 (socket 6) for client
192.168.4.1
nbd-client  1598: client (2) read passwd ok from port 3034
nbd-client  1598: client (2) got cliserv magic ok from port 3034
nbd-client  1598: client (2) got a signature ok from port 3034
nbd-client  1598: client (2) begins main loop
nbd-server  4483: server (2) set new signal handlers for slave server 4483
nbd-server  4483: slavesighandler server (2) activates slave sighandler for
signal 11


And it goes on.  Anyway, if anyone has any suggestions for what to check
into, I'd appreciate it (again), and if there are any diagnostics that might
help, let me know.

Thanks,

-Chris

PS Here are the options I have set in the makefile (mostly as it came):

###################### run options #########################
#
# server machine (runs nbd-server)
SERVER       = beowulf2
# client machine (runs nbd-client and nbd.o)
CLIENT       = beowulf1
# client whole device name
DEVICE       = /dev/nda
# server builds these resources for export
EXPORT       = /tmp/core0 /tmp/core1
# each of this many blocks
SRVSIZ       = 4096
# blocksize on client and server 1024, 2048, 4096
BLKSIZ       = 1024
# control port
PORT         = 3033
# how many parallel channels (otherwise use server addr repeats)
OPTSOCK      = -n 4
#OPTSOCK     := $(SERVER) $(SERVER) $(SERVER) $(SERVER)
# session signature                        - default = random
#OPTSIG       = -i "barbarbarbar"
# blocksize                                - default = 1024B or highest
offer
OPTBLK       = -b $(BLKSIZ)
# use raid linear or raid 0 or raid 1 on server - default = linear
OPTRAID      = -0
# timeout for first negotiate              - default = 60s
OPTTIME      = -t 120
# readonly on server side
OPTRO        = #-r
# server maintains write order until this many ms delay - default = 0ms
OPTORDER     = -w 10000
# client uses checksumming protocol at startup
OPTMD5SUM    = #-m
# journalling
OPTJRNL      = #-j /tmp/nbd.log
# module readahead in blocks
RAHEAD       = 20
# module-based request aggregation (default 0)
MERGE_REQS   = 32
# module timeout seconds on daemon service
OPT_REQ_TIMEO= -p 5
# module force sync every n seconds
SYNC_INTVL   = 0
# apply throttle at n KB per second (default 0 = don't throttle)
SPEED_LIM    = 0
# make client asyncronous (default, no). client acks before net replies.
#OPTASYNC     = -a
#
############################################################