[ENBD] 2.4.32
Peter T. Breuer
ptb at it.uc3m.es
Fri Mar 5 12:03:59 MST 2004
"Also sprach Anders Blomdell:"
> > I think the raid devices ONLY use 1024 blocksizes, which will not
> > match with whatever the real blocksize is on the server side, by
> > the looks of things.
> But isn't default blocksize 1024?
If you don't say, then the server guesses by looking at the blocksize of
its resource. Then it negotiates with the client to see if the client
is agreeable or wants even more. Together they choose the biggest. It
looks likely to me that blksize of more than 1024 got chosen here.
> >> md0 : active raid1 [dev 2b:10][2](F) [dev 2b:00][1](F) hda7[0]
> >> 39069952 blocks [3/1] [U__]
> >
> > Well, that's something else.
> Perhaps these:
>
> enbd-client 4238: <#1313> handle_server_check_err Server_check (3) failed
> Timer expired on server3:30001 so clear socket
> enbd-client: enbd-client 4238: <# 155> unplug requested unplug (3) Timer
> expired on server3:30001 so clear socket
Possibly - the client is saying that the server is too slow to respond
to its keepalive ping. So slow that it gives up, dies, and is reborn.
That "server3:30001" signifies hostname:port. What does the server
have to say about why it died?
You know, it might be an idea when -e is in force not to die through
server inattention. But what else can one do? The other end might have
died and in that case we want to reconnect.
> >> enbd-client {server}:{port} -n 4 -i {diskid} -m -t 0 -e {enbddevice}
> >>
> >> That errors lead to faulting the mirrors is expected (-e), but errors are
> Look, no blocksize requested --> 1024??
No blocksize requested -> take it from the resource, and negotiate
upwards with the client if necesary.
> > think. That's because remote errors will be reported as errors anyway,
> > no matter what -e is set as (-e nowadays causes an error to be reported
> > when the network fails).
> I'll stop using '-e'
Possibly. I really am not sure yet what the cause is ...
> Server1:
>
> Device a: Open
> [a] State: signed, rw, enabled, validated, show_errs, plug, md5sum,
> acct, last error 0, lives 0, bp 0
This has never died.
> [a] Queued: +0R/0W curr (check 0R/0W) +3R/1022W max
> [a] Buffersize: 262144 (sectors=512, blocks=64)
> [a] Blocksize: 4096 (log=12)
> [a] Size: 39070048KB
> [a] Blocks: 9767512
> [a] Sockets: 4 (*) (+) (+) (+)
> [a] Requested: 9.5595M (2.38M) (2.39M) (2.39M) (2.37M) 271R/9.558MW max
> 1
> [a] Despatched: 9.5595M (2.38M) (2.39M) (2.39M) (2.37M) 271R/9.558MW md5
> 9.55MW (9.55M eq, 84 ne, 0 dn)
> [a] Errored: 1 (0) (0) (0) (1) 1+0
Well, there is one errored request. It might well match your error log
above. I think that yes, no -e will cure this.
> [a] Pending: 0 (0) (0) (0) (0) 0R/0W+0R/0W
> [a] B/s now: 0 (0R+0W)
> [a] B/s ave: 7.49M (0R+7.49MW)
> [a] B/s max: 29.8M (340KR+29.8MW)
> [a] Spectrum: 100%1
> [a] Kthreads: 0 (0 waiting/0 running/1 max)
> [a] Cthreads: 4 (+) (+) (+) (+)
> [a] Cpids: 4 (4226) (4227) (4228) (4320)
> Device b: Open
> [b] State: signed, rw, enabled, validated, show_errs, plug, md5sum,
> acct, last error 0, lives 0, bp 0
This has also never died.
> [b] Queued: +0R/0W curr (check 0R/0W) +3R/1024W max
> [b] Buffersize: 262144 (sectors=512, blocks=64)
> [b] Blocksize: 4096 (log=12)
> [b] Size: 39070048KB
> [b] Blocks: 9767512
> [b] Sockets: 4 (*) (+) (+) (+)
> [b] Requested: 297.76K (77.9K) (62.5K) (77.7K) (79.5K) 16R/297.7KW max
> 1
> [b] Despatched: 297.76K (77.9K) (62.5K) (77.7K) (79.5K) 16R/297.7KW md5
> 153KW (151K eq, 1.62K ne, 0 dn)
> [b] Errored: 7 (1) (3) (1) (2) 7+0
But has had seven errored requests. Again, I think lack of a -e will
help.
If things still look bad, then I will revise what happens in the
kernel. But I am pretty sure that without -e requests that are not
attended will be rolled back for treatment by another or the same
daemon instead of being errored.
> [b] Pending: 0 (0) (0) (0) (0) 0R/0W+0R/0W
> [b] B/s now: 0 (0R+0W)
> [b] B/s ave: 236K (0R+236KW)
> [b] B/s max: 20.6M (4.00KR+20.6MW)
> [b] Spectrum: 100%1
> [b] Kthreads: 0 (0 waiting/0 running/1 max)
> [b] Cthreads: 4 (+) (+) (+) (+)
> [b] Cpids: 4 (4235) (4303) (4237) (4342)
> Device c: Open
> [c] State: signed, rw, enabled, validated, show_errs, plug, md5sum,
> acct, last error 0, lives 0, bp 0
This also has never died. All very good.
> [c] Queued: +0R/0W curr (check 0R/0W) +3R/32W max
> [c] Buffersize: 262144 (sectors=512, blocks=64)
> [c] Blocksize: 4096 (log=12)
> [c] Size: 39070048KB
> [c] Blocks: 9767512
> [c] Sockets: 4 (+) (+) (*) (+)
> [c] Requested: 2.7556M (706K) (708K) (703K) (702K) 13R/2.755MW max
> 1
> [c] Despatched: 2.7556M (706K) (708K) (703K) (702K) 13R/2.755MW md5
> 2.74MW (2.74M eq, 142 ne, 0 dn)
> [c] Errored: 0 (0) (0) (0) (0) 0+0
And has remained clean too.
> [c] Pending: 0 (0) (0) (0) (0) 0R/0W+0R/0W
> [c] B/s now: 12.8M (0R+12.8MW)
> [c] B/s ave: 2.32M (0R+2.32MW)
> [c] B/s max: 26.4M (0R+26.4MW)
> [c] Spectrum: 100%1
> [c] Kthreads: 0 (0 waiting/0 running/1 max)
> [c] Cthreads: 4 (+) (+) (+) (+)
> [c] Cpids: 4 (4240) (4241) (4242) (4243)
> Device d: Open
> [d] State: signed, rw, enabled, validated, show_errs, plug, md5sum,
> acct, last error 0, lives 0, bp 0
Never died.
> [d] Queued: +0R/0W curr (check 0R/0W) +3R/28W max
> [d] Buffersize: 262144 (sectors=512, blocks=64)
> [d] Blocksize: 4096 (log=12)
> [d] Size: 39070048KB
> [d] Blocks: 9767512
> [d] Sockets: 4 (+) (+) (*) (+)
> [d] Requested: 2.7556M (706K) (705K) (704K) (705K) 13R/2.755MW max
> 1
> [d] Despatched: 2.7556M (706K) (705K) (704K) (705K) 13R/2.755MW md5
> 2.75MW (2.75M eq, 2 ne, 0 dn)
> [d] Errored: 3 (0) (1) (1) (1) 3+0
Some requests errored. Remove -e.
> [d] Pending: 0 (0) (0) (0) (0) 0R/0W+0R/0W
> [d] B/s now: 11.4M (0R+11.4MW)
> [d] B/s ave: 2.32M (0R+2.32MW)
> [d] B/s max: 26.2M (0R+26.2MW)
> [d] Spectrum: 100%1
> [d] Kthreads: 0 (0 waiting/0 running/1 max)
> [d] Cthreads: 4 (+) (+) (+) (+)
> [d] Cpids: 4 (4249) (4250) (4251) (4252)
>
> Server2:
>
> Device a: Open
> [a] State: signed, rw, enabled, validated, show_errs, plug, md5sum,
> acct, last error 0, lives 0, bp 0
Looks good.
> [a] Queued: +0R/9W curr (check 0R/9W) +3R/1022W max
> [a] Buffersize: 262144 (sectors=512, blocks=64)
> [a] Blocksize: 4096 (log=12)
> [a] Size: 39070048KB
> [a] Blocks: 9767512
> [a] Sockets: 4 (+) (*) (+) (+)
> [a] Requested: 7.4996M (1.87M) (1.87M) (1.87M) (1.87M) 274R/7.499MW max
> 1
> [a] Despatched: 7.4996M (1.87M) (1.87M) (1.87M) (1.87M) 274R/7.499MW md5
> 7.27MW (7.27M eq, 3.24K ne, 0 dn)
> [a] Errored: 4 (1) (1) (1) (1) 4+0
few requests errored. Remove -e.
> [a] Pending: 4 (1) (1) (1) (1) 0R/4W+0R/9W
> [a] B/s now: 22.2M (0R+22.2MW)
> [a] B/s ave: 5.84M (0R+5.84MW)
> [a] B/s max: 31.6M (340KR+31.6MW)
> [a] Spectrum: 100%1
> [a] Kthreads: 0 (0 waiting/0 running/1 max)
> [a] Cthreads: 0 (-) (-) (-) (-)
> [a] Cpids: 0 (4332) (4333) (4334) (4335)
> Device b: Open
> [b] State: signed, rw, enabled, validated, show_errs, plug, md5sum,
> acct, last error 0, lives 0, bp 0
> [b] Queued: +0R/16W curr (check 0R/16W) +3R/1024W max
> [b] Buffersize: 262144 (sectors=512, blocks=64)
> [b] Blocksize: 4096 (log=12)
> [b] Size: 39070048KB
> [b] Blocks: 9767512
> [b] Sockets: 4 (*) (+) (+) (+)
> [b] Requested: 8.0733M (2.02M) (2.02M) (2.00M) (2.02M) 19R/8.073MW max
> 1
> [b] Despatched: 8.0733M (2.02M) (2.02M) (2.00M) (2.02M) 19R/8.073MW md5
> 7.84MW (7.83M eq, 3.40K ne, 0 dn)
> [b] Errored: 0 (0) (0) (0) (0) 0+0
> [b] Pending: 4 (1) (1) (1) (1) 0R/4W+0R/16W
> [b] B/s now: 21.2M (0R+21.2MW)
> [b] B/s ave: 6.45M (0R+6.45MW)
> [b] B/s max: 30.3M (4.00KR+30.3MW)
> [b] Spectrum: 100%1
> [b] Kthreads: 0 (0 waiting/0 running/1 max)
> [b] Cthreads: 0 (-) (-) (-) (-)
> [b] Cpids: 0 (4341) (4342) (4343) (4344)
> Device c: Open
> [c] State: signed, rw, enabled, validated, show_errs, plug, md5sum,
> acct, last error 0, lives 0, bp 0
> [c] Queued: +0R/0W curr (check 0R/0W) +3R/32W max
> [c] Buffersize: 262144 (sectors=512, blocks=64)
> [c] Blocksize: 4096 (log=12)
> [c] Size: 39070048KB
> [c] Blocks: 9767512
> [c] Sockets: 4 (+) (*) (+) (+)
> [c] Requested: 117.38K (29.2K) (29.2K) (29.5K) (29.3K) 7R/117.3KW max
> 1
> [c] Despatched: 117.38K (29.2K) (29.2K) (29.5K) (29.3K) 7R/117.3KW md5
> 117KW (117K eq, 3 ne, 0 dn)
> [c] Errored: 1 (0) (0) (1) (0) 1+0
> [c] Pending: 0 (0) (0) (0) (0) 0R/0W+0R/0W
> [c] B/s now: 0 (0R+0W)
> [c] B/s ave: 96.0K (0R+96.0KW)
> [c] B/s max: 18.7M (0R+18.7MW)
> [c] Spectrum: 100%1
> [c] Kthreads: 0 (0 waiting/0 running/1 max)
> [c] Cthreads: 4 (+) (+) (+) (+)
> [c] Cpids: 4 (4348) (4349) (4350) (4351)
> Device d: Open
> [d] State: signed, rw, enabled, validated, show_errs, plug, md5sum,
> acct, last error 0, lives 0, bp 0
> [d] Queued: +0R/0W curr (check 0R/0W) +3R/32W max
> [d] Buffersize: 262144 (sectors=512, blocks=64)
> [d] Blocksize: 4096 (log=12)
> [d] Size: 39070048KB
> [d] Blocks: 9767512
> [d] Sockets: 4 (*) (+) (+) (+)
> [d] Requested: 117.41K (29.1K) (29.1K) (29.7K) (29.3K) 7R/117.4KW max
> 1
> [d] Despatched: 117.41K (29.1K) (29.1K) (29.7K) (29.3K) 7R/117.4KW md5
> 117KW (117K eq, 6 ne, 0 dn)
> [d] Errored: 0 (0) (0) (0) (0) 0+0
> [d] Pending: 0 (0) (0) (0) (0) 0R/0W+0R/0W
> [d] B/s now: 0 (0R+0W)
> [d] B/s ave: 96.0K (0R+96.0KW)
> [d] B/s max: 18.7M (0R+18.7MW)
> [d] Spectrum: 100%1
> [d] Kthreads: 0 (0 waiting/0 running/1 max)
> [d] Cthreads: 4 (+) (+) (+) (+)
> [d] Cpids: 4 (4353) (4354) (4355) (4356)
> Device e-p: Closed
>
> Server3:
>
> Device a: Open
> [a] State: signed, rw, enabled, validated, show_errs, plug, md5sum,
> acct, last error 0, lives 0, bp 0
> [a] Queued: +0R/0W curr (check 0R/0W) +3R/1023W max
> [a] Buffersize: 262144 (sectors=512, blocks=64)
> [a] Blocksize: 4096 (log=12)
> [a] Size: 39070048KB
> [a] Blocks: 9767512
> [a] Sockets: 4 (+) (+) (+) (*)
> [a] Requested: 296.57K (75.0K) (73.5K) (74.0K) (73.9K) 271R/296.3KW max
> 1
> [a] Despatched: 296.56K (75.0K) (73.5K) (74.0K) (73.9K) 271R/296.3KW md5
> 88.0KW (85.4K eq, 2.64K ne, 0 dn)
> [a] Errored: 2 (1) (0) (1) (0) 2+0
> [a] Pending: 0 (0) (0) (0) (0) 0R/0W+0R/0W
> [a] B/s now: 0 (0R+0W)
> [a] B/s ave: 228K (0R+224KW)
> [a] B/s max: 18.2M (340KR+18.2MW)
> [a] Spectrum: 100%1
> [a] Kthreads: 0 (0 waiting/0 running/1 max)
> [a] Cthreads: 4 (+) (+) (+) (+)
> [a] Cpids: 4 (10255) (10256) (10257) (10258)
> Device b: Open
> [b] State: signed, rw, enabled, validated, show_errs, plug, md5sum,
> acct, last error 0, lives 0, bp 0
> [b] Queued: +0R/0W curr (check 0R/0W) +3R/1022W max
> [b] Buffersize: 262144 (sectors=512, blocks=64)
> [b] Blocksize: 4096 (log=12)
> [b] Size: 39070048KB
> [b] Blocks: 9767512
> [b] Sockets: 4 (+) (+) (+) (*)
> [b] Requested: 387.77K (96.8K) (97.0K) (96.6K) (97.2K) 16R/387.7KW max
> 1
> [b] Despatched: 387.77K (96.8K) (97.0K) (96.6K) (97.2K) 16R/387.7KW md5
> 387KW (387K eq, 39 ne, 0 dn)
> [b] Errored: 1 (0) (0) (1) (0) 1+0
> [b] Pending: 0 (0) (0) (0) (0) 0R/0W+0R/0W
> [b] B/s now: 0 (0R+0W)
> [b] B/s ave: 308K (0R+308KW)
> [b] B/s max: 19.3M (0R+19.3MW)
> [b] Spectrum: 100%1
> [b] Kthreads: 0 (0 waiting/0 running/1 max)
> [b] Cthreads: 4 (+) (+) (+) (+)
> [b] Cpids: 4 (10264) (10265) (10338) (10267)
> Device c: Open
> [c] State: signed, rw, enabled, validated, show_errs, plug, md5sum,
> acct, last error 0, lives 0, bp 0
> [c] Queued: +0R/0W curr (check 0R/0W) +3R/32W max
> [c] Buffersize: 262144 (sectors=512, blocks=64)
> [c] Blocksize: 4096 (log=12)
> [c] Size: 39070048KB
> [c] Blocks: 9767512
> [c] Sockets: 4 (+) (+) (*) (+)
> [c] Requested: 6.6284M (1.67M) (1.66M) (1.65M) (1.62M) 4R/6.628MW max
> 1
> [c] Despatched: 6.6284M (1.67M) (1.66M) (1.65M) (1.62M) 4R/6.628MW md5
> 6.62MW (6.62M eq, 2 ne, 0 dn)
> [c] Errored: 1 (1) (0) (0) (0) 1+0
> [c] Pending: 0 (0) (0) (0) (0) 0R/0W+0R/0W
> [c] B/s now: 240K (0R+240KW)
> [c] B/s ave: 5.55M (0R+5.55MW)
> [c] B/s max: 25.0M (0R+25.0MW)
> [c] Spectrum: 100%1
> [c] Kthreads: 0 (0 waiting/0 running/1 max)
> [c] Cthreads: 4 (+) (+) (+) (+)
> [c] Cpids: 4 (10336) (10270) (10271) (10272)
> Device d: Open
> [d] State: signed, rw, enabled, validated, show_errs, plug, md5sum,
> acct, last error 0, lives 0, bp 0
> [d] Queued: +0R/0W curr (check 0R/0W) +3R/32W max
> [d] Buffersize: 262144 (sectors=512, blocks=64)
> [d] Blocksize: 4096 (log=12)
> [d] Size: 39070048KB
> [d] Blocks: 9767512
> [d] Sockets: 4 (+) (*) (+) (+)
> [d] Requested: 6.6284M (1.66M) (1.64M) (1.65M) (1.66M) 4R/6.628MW max
> 1
> [d] Despatched: 6.6284M (1.66M) (1.64M) (1.65M) (1.66M) 4R/6.628MW md5
> 6.62MW (6.62M eq, 4 ne, 0 dn)
> [d] Errored: 0 (0) (0) (0) (0) 0+0
> [d] Pending: 0 (0) (0) (0) (0) 0R/0W+0R/0W
> [d] B/s now: 244K (0R+244KW)
> [d] B/s ave: 5.55M (0R+5.55MW)
> [d] B/s max: 25.1M (0R+25.1MW)
> [d] Spectrum: 100%1
> [d] Kthreads: 0 (0 waiting/0 running/1 max)
> [d] Cthreads: 4 (+) (+) (+) (+)
> [d] Cpids: 4 (10276) (10277) (10278) (10279)
> Device e-p: Closed
Yadda yadda yadda. That was quite a lot! All of them seem to be mostly
ok, but it seems to be as you described, that under memory pressure
the server side cannot respond quickly enough to a keepalive ping. I
almost wonder if keeping -e would be right, and simply extend the
timeout! How about "-p 60" at the client side? Surely the server
should be able to respond to a ping in under 60s?
> > And if you have some advice about how to choose the blocksize
> > then I would be very happy to listen to it!
> No advice (yet at least).
Are you using fr1 or raid1? fr1 is indicated, unless you want to go
crazy with resyncs over the net at those sizes!
Peter
More information about the ENBD
mailing list