[ENBD] 2.4.32

Peter T. Breuer ptb at it.uc3m.es
Fri Mar 5 12:03:59 MST 2004


"Also sprach Anders Blomdell:"
> > I think the raid devices ONLY use 1024 blocksizes, which will not
> > match with whatever the real blocksize is on the server side, by
> > the looks of things.
> But isn't default blocksize 1024?

If you don't say, then the server guesses by looking at the blocksize of
its resource.  Then it negotiates with the client to see if the client
is agreeable or wants even more.  Together they choose the biggest.  It
looks likely to me that blksize of more than 1024 got chosen here.

> >>    md0 : active raid1 [dev 2b:10][2](F) [dev 2b:00][1](F) hda7[0]
> >>        39069952 blocks [3/1] [U__]
> >
> > Well, that's something else.
> Perhaps these:
> 
> enbd-client  4238: <#1313> handle_server_check_err Server_check (3) failed 
> Timer expired on server3:30001 so clear socket
> enbd-client: enbd-client  4238: <# 155> unplug requested unplug (3) Timer 
> expired on server3:30001 so clear socket

Possibly - the client is saying that the server is too slow to respond
to its keepalive ping.  So slow that it gives up, dies, and is reborn.
That "server3:30001" signifies hostname:port.  What does the server
have to say about why it died?

You know, it might be an idea when -e is in force not to die through
server inattention. But what else can one do? The other end might have
died and in that case we want to reconnect.

> >>    enbd-client {server}:{port} -n 4 -i {diskid} -m -t 0 -e {enbddevice}
> >>
> >> That errors lead to faulting the mirrors is expected (-e), but errors are
> Look, no blocksize requested --> 1024??

No blocksize requested -> take it from the resource, and negotiate
upwards with the client if necesary.

> > think.  That's because remote errors will be reported as errors anyway,
> > no matter what -e is set as (-e nowadays causes an error to be reported
> > when the network fails).
> I'll stop using '-e'

Possibly. I really am not sure yet what the cause is ...

> Server1:
> 
> Device a:       Open
> [a] State:      signed, rw, enabled, validated, show_errs, plug, md5sum, 
> acct, last error 0, lives 0, bp 0

This has never died.

> [a] Queued:     +0R/0W curr (check 0R/0W) +3R/1022W max
> [a] Buffersize: 262144  (sectors=512, blocks=64)
> [a] Blocksize:  4096    (log=12)
> [a] Size:       39070048KB
> [a] Blocks:     9767512
> [a] Sockets:    4       (*)     (+)     (+)     (+)
> [a] Requested:  9.5595M (2.38M) (2.39M) (2.39M) (2.37M) 271R/9.558MW    max 
> 1
> [a] Despatched: 9.5595M (2.38M) (2.39M) (2.39M) (2.37M) 271R/9.558MW    md5 
> 9.55MW (9.55M eq, 84 ne, 0 dn)
> [a] Errored:    1       (0)     (0)     (0)     (1)     1+0

Well, there is one errored request. It might well match your error log
above. I think that yes, no -e will cure this.


> [a] Pending:    0       (0)     (0)     (0)     (0)     0R/0W+0R/0W
> [a] B/s now:    0       (0R+0W)
> [a] B/s ave:    7.49M   (0R+7.49MW)
> [a] B/s max:    29.8M   (340KR+29.8MW)
> [a] Spectrum:   100%1
> [a] Kthreads:   0       (0 waiting/0 running/1 max)
> [a] Cthreads:   4       (+)     (+)     (+)     (+)
> [a] Cpids:      4       (4226)  (4227)  (4228)  (4320)
> Device b:       Open
> [b] State:      signed, rw, enabled, validated, show_errs, plug, md5sum, 
> acct, last error 0, lives 0, bp 0


This has also never died.


> [b] Queued:     +0R/0W curr (check 0R/0W) +3R/1024W max
> [b] Buffersize: 262144  (sectors=512, blocks=64)
> [b] Blocksize:  4096    (log=12)
> [b] Size:       39070048KB
> [b] Blocks:     9767512
> [b] Sockets:    4       (*)     (+)     (+)     (+)
> [b] Requested:  297.76K (77.9K) (62.5K) (77.7K) (79.5K) 16R/297.7KW     max 
> 1
> [b] Despatched: 297.76K (77.9K) (62.5K) (77.7K) (79.5K) 16R/297.7KW     md5 
> 153KW (151K eq, 1.62K ne, 0 dn)
> [b] Errored:    7       (1)     (3)     (1)     (2)     7+0

But has had seven errored requests. Again, I think lack of a -e will
help.

If things still look bad, then I will revise what happens in the
kernel. But I am pretty sure that without -e requests that are not
attended will be rolled back for treatment by another or the same
daemon instead of being errored.


> [b] Pending:    0       (0)     (0)     (0)     (0)     0R/0W+0R/0W
> [b] B/s now:    0       (0R+0W)
> [b] B/s ave:    236K    (0R+236KW)
> [b] B/s max:    20.6M   (4.00KR+20.6MW)
> [b] Spectrum:   100%1
> [b] Kthreads:   0       (0 waiting/0 running/1 max)
> [b] Cthreads:   4       (+)     (+)     (+)     (+)
> [b] Cpids:      4       (4235)  (4303)  (4237)  (4342)
> Device c:       Open
> [c] State:      signed, rw, enabled, validated, show_errs, plug, md5sum, 
> acct, last error 0, lives 0, bp 0


This also has never died. All very good.


> [c] Queued:     +0R/0W curr (check 0R/0W) +3R/32W max
> [c] Buffersize: 262144  (sectors=512, blocks=64)
> [c] Blocksize:  4096    (log=12)
> [c] Size:       39070048KB
> [c] Blocks:     9767512
> [c] Sockets:    4       (+)     (+)     (*)     (+)
> [c] Requested:  2.7556M (706K)  (708K)  (703K)  (702K)  13R/2.755MW     max 
> 1
> [c] Despatched: 2.7556M (706K)  (708K)  (703K)  (702K)  13R/2.755MW     md5 
> 2.74MW (2.74M eq, 142 ne, 0 dn)
> [c] Errored:    0       (0)     (0)     (0)     (0)     0+0


And has remained clean too.


> [c] Pending:    0       (0)     (0)     (0)     (0)     0R/0W+0R/0W
> [c] B/s now:    12.8M   (0R+12.8MW)
> [c] B/s ave:    2.32M   (0R+2.32MW)
> [c] B/s max:    26.4M   (0R+26.4MW)
> [c] Spectrum:   100%1
> [c] Kthreads:   0       (0 waiting/0 running/1 max)
> [c] Cthreads:   4       (+)     (+)     (+)     (+)
> [c] Cpids:      4       (4240)  (4241)  (4242)  (4243)
> Device d:       Open
> [d] State:      signed, rw, enabled, validated, show_errs, plug, md5sum, 
> acct, last error 0, lives 0, bp 0


Never died.


> [d] Queued:     +0R/0W curr (check 0R/0W) +3R/28W max
> [d] Buffersize: 262144  (sectors=512, blocks=64)
> [d] Blocksize:  4096    (log=12)
> [d] Size:       39070048KB
> [d] Blocks:     9767512
> [d] Sockets:    4       (+)     (+)     (*)     (+)
> [d] Requested:  2.7556M (706K)  (705K)  (704K)  (705K)  13R/2.755MW     max 
> 1
> [d] Despatched: 2.7556M (706K)  (705K)  (704K)  (705K)  13R/2.755MW     md5 
> 2.75MW (2.75M eq, 2 ne, 0 dn)
> [d] Errored:    3       (0)     (1)     (1)     (1)     3+0

Some requests errored. Remove -e.

> [d] Pending:    0       (0)     (0)     (0)     (0)     0R/0W+0R/0W
> [d] B/s now:    11.4M   (0R+11.4MW)
> [d] B/s ave:    2.32M   (0R+2.32MW)
> [d] B/s max:    26.2M   (0R+26.2MW)
> [d] Spectrum:   100%1
> [d] Kthreads:   0       (0 waiting/0 running/1 max)
> [d] Cthreads:   4       (+)     (+)     (+)     (+)
> [d] Cpids:      4       (4249)  (4250)  (4251)  (4252)
> 
> Server2:
> 
> Device a:       Open
> [a] State:      signed, rw, enabled, validated, show_errs, plug, md5sum, 
> acct, last error 0, lives 0, bp 0

Looks good.

> [a] Queued:     +0R/9W curr (check 0R/9W) +3R/1022W max
> [a] Buffersize: 262144  (sectors=512, blocks=64)
> [a] Blocksize:  4096    (log=12)
> [a] Size:       39070048KB
> [a] Blocks:     9767512
> [a] Sockets:    4       (+)     (*)     (+)     (+)
> [a] Requested:  7.4996M (1.87M) (1.87M) (1.87M) (1.87M) 274R/7.499MW    max 
> 1
> [a] Despatched: 7.4996M (1.87M) (1.87M) (1.87M) (1.87M) 274R/7.499MW    md5 
> 7.27MW (7.27M eq, 3.24K ne, 0 dn)
> [a] Errored:    4       (1)     (1)     (1)     (1)     4+0

few requests errored. Remove -e.


> [a] Pending:    4       (1)     (1)     (1)     (1)     0R/4W+0R/9W
> [a] B/s now:    22.2M   (0R+22.2MW)
> [a] B/s ave:    5.84M   (0R+5.84MW)
> [a] B/s max:    31.6M   (340KR+31.6MW)
> [a] Spectrum:   100%1
> [a] Kthreads:   0       (0 waiting/0 running/1 max)
> [a] Cthreads:   0       (-)     (-)     (-)     (-)
> [a] Cpids:      0       (4332)  (4333)  (4334)  (4335)
> Device b:       Open
> [b] State:      signed, rw, enabled, validated, show_errs, plug, md5sum, 
> acct, last error 0, lives 0, bp 0
> [b] Queued:     +0R/16W curr (check 0R/16W) +3R/1024W max
> [b] Buffersize: 262144  (sectors=512, blocks=64)
> [b] Blocksize:  4096    (log=12)
> [b] Size:       39070048KB
> [b] Blocks:     9767512
> [b] Sockets:    4       (*)     (+)     (+)     (+)
> [b] Requested:  8.0733M (2.02M) (2.02M) (2.00M) (2.02M) 19R/8.073MW     max 
> 1
> [b] Despatched: 8.0733M (2.02M) (2.02M) (2.00M) (2.02M) 19R/8.073MW     md5 
> 7.84MW (7.83M eq, 3.40K ne, 0 dn)
> [b] Errored:    0       (0)     (0)     (0)     (0)     0+0
> [b] Pending:    4       (1)     (1)     (1)     (1)     0R/4W+0R/16W
> [b] B/s now:    21.2M   (0R+21.2MW)
> [b] B/s ave:    6.45M   (0R+6.45MW)
> [b] B/s max:    30.3M   (4.00KR+30.3MW)
> [b] Spectrum:   100%1
> [b] Kthreads:   0       (0 waiting/0 running/1 max)
> [b] Cthreads:   0       (-)     (-)     (-)     (-)
> [b] Cpids:      0       (4341)  (4342)  (4343)  (4344)
> Device c:       Open
> [c] State:      signed, rw, enabled, validated, show_errs, plug, md5sum, 
> acct, last error 0, lives 0, bp 0
> [c] Queued:     +0R/0W curr (check 0R/0W) +3R/32W max
> [c] Buffersize: 262144  (sectors=512, blocks=64)
> [c] Blocksize:  4096    (log=12)
> [c] Size:       39070048KB
> [c] Blocks:     9767512
> [c] Sockets:    4       (+)     (*)     (+)     (+)
> [c] Requested:  117.38K (29.2K) (29.2K) (29.5K) (29.3K) 7R/117.3KW      max 
> 1
> [c] Despatched: 117.38K (29.2K) (29.2K) (29.5K) (29.3K) 7R/117.3KW      md5 
> 117KW (117K eq, 3 ne, 0 dn)
> [c] Errored:    1       (0)     (0)     (1)     (0)     1+0
> [c] Pending:    0       (0)     (0)     (0)     (0)     0R/0W+0R/0W
> [c] B/s now:    0       (0R+0W)
> [c] B/s ave:    96.0K   (0R+96.0KW)
> [c] B/s max:    18.7M   (0R+18.7MW)
> [c] Spectrum:   100%1
> [c] Kthreads:   0       (0 waiting/0 running/1 max)
> [c] Cthreads:   4       (+)     (+)     (+)     (+)
> [c] Cpids:      4       (4348)  (4349)  (4350)  (4351)
> Device d:       Open
> [d] State:      signed, rw, enabled, validated, show_errs, plug, md5sum, 
> acct, last error 0, lives 0, bp 0
> [d] Queued:     +0R/0W curr (check 0R/0W) +3R/32W max
> [d] Buffersize: 262144  (sectors=512, blocks=64)
> [d] Blocksize:  4096    (log=12)
> [d] Size:       39070048KB
> [d] Blocks:     9767512
> [d] Sockets:    4       (*)     (+)     (+)     (+)
> [d] Requested:  117.41K (29.1K) (29.1K) (29.7K) (29.3K) 7R/117.4KW      max 
> 1
> [d] Despatched: 117.41K (29.1K) (29.1K) (29.7K) (29.3K) 7R/117.4KW      md5 
> 117KW (117K eq, 6 ne, 0 dn)
> [d] Errored:    0       (0)     (0)     (0)     (0)     0+0
> [d] Pending:    0       (0)     (0)     (0)     (0)     0R/0W+0R/0W
> [d] B/s now:    0       (0R+0W)
> [d] B/s ave:    96.0K   (0R+96.0KW)
> [d] B/s max:    18.7M   (0R+18.7MW)
> [d] Spectrum:   100%1
> [d] Kthreads:   0       (0 waiting/0 running/1 max)
> [d] Cthreads:   4       (+)     (+)     (+)     (+)
> [d] Cpids:      4       (4353)  (4354)  (4355)  (4356)
> Device e-p:     Closed
> 
> Server3:
> 
> Device a:       Open
> [a] State:      signed, rw, enabled, validated, show_errs, plug, md5sum, 
> acct, last error 0, lives 0, bp 0
> [a] Queued:     +0R/0W curr (check 0R/0W) +3R/1023W max
> [a] Buffersize: 262144  (sectors=512, blocks=64)
> [a] Blocksize:  4096    (log=12)
> [a] Size:       39070048KB
> [a] Blocks:     9767512
> [a] Sockets:    4       (+)     (+)     (+)     (*)
> [a] Requested:  296.57K (75.0K) (73.5K) (74.0K) (73.9K) 271R/296.3KW    max 
> 1
> [a] Despatched: 296.56K (75.0K) (73.5K) (74.0K) (73.9K) 271R/296.3KW    md5 
> 88.0KW (85.4K eq, 2.64K ne, 0 dn)
> [a] Errored:    2       (1)     (0)     (1)     (0)     2+0
> [a] Pending:    0       (0)     (0)     (0)     (0)     0R/0W+0R/0W
> [a] B/s now:    0       (0R+0W)
> [a] B/s ave:    228K    (0R+224KW)
> [a] B/s max:    18.2M   (340KR+18.2MW)
> [a] Spectrum:   100%1
> [a] Kthreads:   0       (0 waiting/0 running/1 max)
> [a] Cthreads:   4       (+)     (+)     (+)     (+)
> [a] Cpids:      4       (10255) (10256) (10257) (10258)
> Device b:       Open
> [b] State:      signed, rw, enabled, validated, show_errs, plug, md5sum, 
> acct, last error 0, lives 0, bp 0
> [b] Queued:     +0R/0W curr (check 0R/0W) +3R/1022W max
> [b] Buffersize: 262144  (sectors=512, blocks=64)
> [b] Blocksize:  4096    (log=12)
> [b] Size:       39070048KB
> [b] Blocks:     9767512
> [b] Sockets:    4       (+)     (+)     (+)     (*)
> [b] Requested:  387.77K (96.8K) (97.0K) (96.6K) (97.2K) 16R/387.7KW     max 
> 1
> [b] Despatched: 387.77K (96.8K) (97.0K) (96.6K) (97.2K) 16R/387.7KW     md5 
> 387KW (387K eq, 39 ne, 0 dn)
> [b] Errored:    1       (0)     (0)     (1)     (0)     1+0
> [b] Pending:    0       (0)     (0)     (0)     (0)     0R/0W+0R/0W
> [b] B/s now:    0       (0R+0W)
> [b] B/s ave:    308K    (0R+308KW)
> [b] B/s max:    19.3M   (0R+19.3MW)
> [b] Spectrum:   100%1
> [b] Kthreads:   0       (0 waiting/0 running/1 max)
> [b] Cthreads:   4       (+)     (+)     (+)     (+)
> [b] Cpids:      4       (10264) (10265) (10338) (10267)
> Device c:       Open
> [c] State:      signed, rw, enabled, validated, show_errs, plug, md5sum, 
> acct, last error 0, lives 0, bp 0
> [c] Queued:     +0R/0W curr (check 0R/0W) +3R/32W max
> [c] Buffersize: 262144  (sectors=512, blocks=64)
> [c] Blocksize:  4096    (log=12)
> [c] Size:       39070048KB
> [c] Blocks:     9767512
> [c] Sockets:    4       (+)     (+)     (*)     (+)
> [c] Requested:  6.6284M (1.67M) (1.66M) (1.65M) (1.62M) 4R/6.628MW      max 
> 1
> [c] Despatched: 6.6284M (1.67M) (1.66M) (1.65M) (1.62M) 4R/6.628MW      md5 
> 6.62MW (6.62M eq, 2 ne, 0 dn)
> [c] Errored:    1       (1)     (0)     (0)     (0)     1+0
> [c] Pending:    0       (0)     (0)     (0)     (0)     0R/0W+0R/0W
> [c] B/s now:    240K    (0R+240KW)
> [c] B/s ave:    5.55M   (0R+5.55MW)
> [c] B/s max:    25.0M   (0R+25.0MW)
> [c] Spectrum:   100%1
> [c] Kthreads:   0       (0 waiting/0 running/1 max)
> [c] Cthreads:   4       (+)     (+)     (+)     (+)
> [c] Cpids:      4       (10336) (10270) (10271) (10272)
> Device d:       Open
> [d] State:      signed, rw, enabled, validated, show_errs, plug, md5sum, 
> acct, last error 0, lives 0, bp 0
> [d] Queued:     +0R/0W curr (check 0R/0W) +3R/32W max
> [d] Buffersize: 262144  (sectors=512, blocks=64)
> [d] Blocksize:  4096    (log=12)
> [d] Size:       39070048KB
> [d] Blocks:     9767512
> [d] Sockets:    4       (+)     (*)     (+)     (+)
> [d] Requested:  6.6284M (1.66M) (1.64M) (1.65M) (1.66M) 4R/6.628MW      max 
> 1
> [d] Despatched: 6.6284M (1.66M) (1.64M) (1.65M) (1.66M) 4R/6.628MW      md5 
> 6.62MW (6.62M eq, 4 ne, 0 dn)
> [d] Errored:    0       (0)     (0)     (0)     (0)     0+0
> [d] Pending:    0       (0)     (0)     (0)     (0)     0R/0W+0R/0W
> [d] B/s now:    244K    (0R+244KW)
> [d] B/s ave:    5.55M   (0R+5.55MW)
> [d] B/s max:    25.1M   (0R+25.1MW)
> [d] Spectrum:   100%1
> [d] Kthreads:   0       (0 waiting/0 running/1 max)
> [d] Cthreads:   4       (+)     (+)     (+)     (+)
> [d] Cpids:      4       (10276) (10277) (10278) (10279)
> Device e-p:     Closed


Yadda yadda yadda. That was quite a lot! All of them seem to be mostly
ok, but it seems to be as you described, that under memory pressure
the server side cannot respond quickly enough to a keepalive ping. I
almost wonder if keeping -e would be right, and simply extend the
timeout! How about "-p 60" at the client side? Surely the server
should be able to respond to a ping in under 60s?


> > And if you have some advice about how to choose the blocksize
> > then I would be very happy to listen to it!
> No advice (yet at least).

Are you using fr1 or raid1? fr1 is indicated, unless you want to go
crazy with resyncs over the net at those sizes!


Peter



More information about the ENBD mailing list