[ENBD] 20G dd hang up!

Kuniyasu SUZAKI enbd@lists.community.tummy.com
Wed, 09 Jan 2002 23:41:31 +0900


Dear,

 >>From: "Peter T. Breuer" <ptb@it.uc3m.es>
 >>Subject: Re: [ENBD] 20G dd hang up!
 >>
 >>After maketest succeeds in localhost to localhost configuration, you
 >>must then get it to succeed in local to remote configuration. I am 99%
 >>certain that you have it working in that configuration and that you are
 >>reporting long-term, not short-term, errors, but I must be sure. So
 >>please let me know.

I see. I attach the LOG file of "make test" which is done with remote
machine. I used ENBD 2.4.26 and it was success.

  Server machine: Dell Precision 220 
                  Pentium III 933Mhz
                  Memory 512M
                  100M Ether (NIC 3Com 3c905C )
                  RedHat 6.2J
                  Kernel 2.2.18

 Client Machine: IBM ThinkPAD X20
                  Pentium III 600Mhz
                  Memory 320M
                  100M Ether
                  RedHat 6.2J
                  Kernel 2.2.18
                  DISK 20G

 >>"hung up" is also not a precisely defined term! Do you mean the network
 >>became sluggish and tcp began to time out? Or do you mean that the
 >>client machine locked solid - i.e. deadlocked or became stuck in some
 >>internal kernel loop?

"dd" worked about 10-20 seconds. At the time the hard disk of the
server machine also drove.  After that the console of client machine
was frozen. I could not use keyboard and console. The "ping" form
other machine couldn't get any answer.

The detail of the situation was as following.

On the nbd-sever
   #/usr/local/sbin/nbd-server 5058 /dev/sda1  -t 120 -b 1024  -0

On nbd-client
# ./nbd-client macineA:5058 -n 4 -b 1024 -t 120   /dev/nda
# dd if=/dev/zero of=/dev/nda bs=1048576 count=19539

The nbd-client and nbd-server told the following messages when the
client machine was frozen.

  On the nbd-client
NBD #968[0]: nbd_rollback rollback req c0272260 from slot 0!
NBD #968[1]: nbd_rollback rollback req c02727d8 from slot 1!
NBD #968[2]: nbd_rollback rollback req c0272378 from slot 2!
NBD #968[3]: nbd_rollback rollback req c0272308 from slot 3!

  On the nbd-sever
nbd-server: mainloop [RANGE! (+14858796828755968)]
nbd-server: mainloop [RANGE! (+17110596642441216)]
nbd-server: mainloop [RANGE! (+15984696735598592)]
nbd-server: mainloop [RANGE! (+40799591842744320)]
nbd-server: server (-1) relaunches child after SIGCHLD 
nbd-server: server (-1) main childminder checking pid 1123
nbd-server: server (-1) main childminder checking pid 1121
nbd-server: server (1) set default signal handlers for slave server 1128
nbd-server: server (-1) main childminder checking pid 1122
nbd-server: server (-1) main childminder checking pid 1120
nbd-server: server (-1) main childminder checking pid 1123
nbd-server: server (-1) main childminder checking pid 1128
nbd-server: server (-1) main childminder checking pid 1122
nbd-server: server (2) set default signal handlers for slave server 1129
nbd-server: server (-1) main childminder checking pid 1120
nbd-server: server (-1) main childminder checking pid 1123
nbd-server: server (0) set default signal handlers for slave server 1130
nbd-server: server (-1) main childminder checking pid 1128
nbd-server: server (-1) main childminder checking pid 1129
nbd-server: server (-1) main childminder checking pid 1120
nbd-server: server (-1) main childminder checking pid 1130
nbd-server: server (-1) main childminder checking pid 1128
nbd-server: server (-1) main childminder checking pid 1129
nbd-server: server (-1) main childminder checking pid 1120
nbd-server: server (3) set default signal handlers for slave server 1131
nbd-server: server (-1) relaunches child after SIGCHLD 
nbd-server: server (-1) main childminder checking pid 1130
nbd-server: server (-1) main childminder checking pid 1128
nbd-server: server (-1) main childminder checking pid 1129
nbd-server: server (-1) main childminder checking pid 1131

What should I do? 

Are there anybody who can transfer the Giga byte data with "dd"
command via ENBD?

  Kuniyasu SUZAKI, National Institute of Advanced Industrial Science and Technology,
  Tsukuba Central 2, Umezono 1-1-1, Tsukuba, Ibaraki 305-8568, JAPAN
  Project NTC (Network Transferable Computer) http://www.etl.go.jp/~suzaki/English/NTC

-----------------    log of "make test"    ----------------------------

machineA% make test
server:machineA <-  -> client:machineB
echo; echo; \
 stty -echo </dev/tty ; \
 sh -c "sudo echo kill nbd-server" ; \
 stty echo </dev/tty ;  \
 sh -c  "sudo killall nbd-server; sleep 1; sudo killall -9 nbd-server" 


Password:
kill nbd-server
nbd-server: no process killed
nbd-server: no process killed
make: [kill-server] Error 1 (ignored)
echo; echo; \
 stty -echo </dev/tty ; \
 sh -c "sudo echo nbd-server" ; \
 stty echo </dev/tty ;  \
 rsync -uav --rsh=ssh  /tmp/nbd-server /tmp/  ; \
 for i in /tmp/core0 /tmp/core1; do sh -c  "test -s $i || dd if=/dev/zero  bs=4096  count=4096 >$i" ; done ; \
 sh -c  "sudo nice -19 /tmp/nbd-server 5058 /tmp/core0 /tmp/core1 -i "NBDabcdefNBD" -t 120 -b 4096  -0    ; pstree -p | grep nbd-server; sleep 300" & 


nbd-server
building file list ... done
wrote 73 bytes  read 16 bytes  178.00 bytes/sec
total size is 135848  speedup is 1526.38
delay 5s ..        |-nbd-server(6213)
nbd-server: server (-2) locked /var/state/nbd/server-NBDabcdefNBD.client_ips
nbd-server: server (-2) pinged service nbd-cstatd at 127.0.0.1:5051
nbd-server: with news "notice server-start 5058 127.0.0.1 
quit 
"
nbd-server: main server (-2) failed connect to 11.11.11.11 on port 5051
nbd-server: main server (-2) failed connect to 10.10.10.10 on port 5051
nbd-server: server (-2) unlocked /var/state/nbd/server-NBDabcdefNBD.client_ips
nbd-server: server (-2) set new signal handlers for master server 6213
nbd-server: connectme notice: setsockopt RCVTIMEO failed with Protocol not available
.
echo; echo; \
 stty -echo </dev/tty ; \
 ssh machineB "sudo echo kill nbd-client"  ; \
 stty echo </dev/tty ; \
 ssh -n machineB "sudo killall -USR1 nbd-client ; sleep 1; sudo killall -9 nbd-client; sleep 4; sudo /sbin/rmmod nbd" 


ntc@machineB's password: 
Password:
Sorry, try again.
Password:
Sorry, try again.
Password:
kill nbd-client
ntc@machineB's password: 
nbd-client: no process killed
nbd-client: no process killed
rmmod: module nbd is not loaded
make: [kill-client] Error 1 (ignored)
echo; echo; \
 stty -echo </dev/tty ; \
 ssh machineB "sudo echo nbd-client"  ; \
 stty echo </dev/tty ; \
     copy(){ rsync -uav --rsh=ssh $1 machineB:$2 ; } ; copy  /tmp/nbd.o  /tmp/ ; \
     copy(){ rsync -uav --rsh=ssh $1 machineB:$2 ; } ; copy  /tmp/nbd-client /tmp/ ; \
     copy(){ rsync -uav --rsh=ssh $1 machineB:$2 ; } ; copy  /home/ntc/work/nbd/nbd-2.4.26/MAKEDEV /tmp/ ; \
 ssh -n machineB "cd /dev; sudo /tmp/MAKEDEV /dev/nda" ; 


ntc@machineB's password: 
nbd-client
ntc@machineB's password: 
building file list ... done
nbd.o
wrote 42032 bytes  read 32 bytes  12018.29 bytes/sec
total size is 41920  speedup is 1.00
ntc@machineB's password: 
building file list ... done
nbd-client
wrote 123018 bytes  read 32 bytes  22372.73 bytes/sec
total size is 122893  speedup is 1.00
ntc@machineB's password: 
building file list ... done
MAKEDEV
wrote 1283 bytes  read 32 bytes  526.00 bytes/sec
total size is 1173  speedup is 0.89
ntc@machineB's password: 
ssh -n machineB "sudo /sbin/insmod /tmp/nbd.o rahead=20 merge_requests=0 sync_intvl=0"  ; \
 ssh -n machineB "sudo nice -19 /tmp/nbd-client machineA:5058 -n 4 -b 4096  -i "NBDabcdefNBD" -t 120   -p 5  -d 1 /dev/nda ; pstree -p | grep nbd-client; sleep 90" 
ntc@machineB's password: 
ntc@machineB's password: 
nbd-client: client (-1) manager opened NBD device /dev/nda (2b00)
nbd-client: client (-1) starts introduction sequence on port 5058
nbd-server: server (-2) opened port 5058 (socket 8) for client 10.10.10.10
nbd-server: server (-1) set default signal handlers for session server 6268
nbd-server: server (-1) read passwd ok
nbd-server: server (-1) got cliserv magic ok
        |-nbd-client(7878)
nbd-server: server (-1) received id device 2b00 ok
nbd-server: server (-1) sent size 8388608 ok
nbd-server: server (-1) sent sig ok
nbd-server: server (-1) suggested ro flags 0 ok
nbd-server: server (-1) received blksize 4096 ok
nbd-server: server (-1) sent/negotiated blksize 4096 ok
nbd-server: server (-1) received pulse_intvl 5 ok
nbd-server: server (-1) sent/negotiated pulse interval 10 ok
nbd-server: server (-1) agreed 4 channels ok
nbd-server: server (-1) selected free port at 5059
nbd-server: server (-1) posted port 5059 ok
nbd-client: client (-1) got session port 5059 ok
nbd-client: client (-1) introduction sequence ends ok
nbd-client: setkernel client (-1) set device ro flag 0
checking 127.0.0.1
checking 11.11.11.11
checking 10.10.10.10
nbd-server: server (-1) manager started new process group 6268
nbd-server: server (3) set default signal handlers for slave server 6272
nbd-server: server (2) set default signal handlers for slave server 6271
nbd-server: server (3) opened port 5059 (socket 10) for client 10.10.10.10
nbd-server: server (2) opened port 5059 (socket 10) for client 10.10.10.10
nbd-server: server (2) set new signal handlers for slave server 6271
nbd-client: client (0) begins main loop
nbd-client: client (1) begins main loop
nbd-server: server (1) set default signal handlers for slave server 6270
nbd-server: server (1) opened port 5059 (socket 10) for client 10.10.10.10
nbd-server: server (1) set new signal handlers for slave server 6270
nbd-client: client (2) begins main loop
nbd-server: server (0) set default signal handlers for slave server 6269
nbd-server: server (0) opened port 5059 (socket 10) for client 10.10.10.10
nbd-server: server (0) set new signal handlers for slave server 6269
nbd-client: client (3) begins main loop
nbd-server: server (-1) set new signal handlers for session server 6268
nbd-server: server (3) set new signal handlers for slave server 6272
machineA% 

-----------------    End of log of "make test"    ----------------------------