[ENBD] Re: speed_limit_min and blocksize
P.T. Breuer
ptb at it.uc3m.es
Sat May 29 09:17:32 MDT 2004
In article <200405271827.i4RIR1H28208 at oboe.it.uc3m.es> you wrote:
> I forgot to tell you to patch the enbd.c driver as follows (I'll do it
> here now that I noticed it):
>
> if (slot)
> atomic_inc(&slot->md_count);
> else
> atomic_inc(&lo->md_count);
> - if (!atomic_test_and_set_mask (&lo->flags, ENBD_SHOW_ERRS)) {
> - ENBD_INFO ("set show_errs on nd%s\n", lo->devnam);
> - atomic_set_mask (ENBD_RAID_SHOW_ERRS, &lo->flags);
> - }
> spin_unlock (&md_access_lock);
> return 0;
>
> and
>
> }
> if (--md_count <= 0) {
> md_notify_fn = NULL;
> - if (atomic_test_and_clear_mask(&lo->flags, ENBD_RAID_SHOW_ERRS)) {
> - atomic_clear_mask (ENBD_SHOW_ERRS, &lo->flags);
> - ENBD_INFO ("cleared show_errs on nd%s\n", lo->devnam);
> - }
> }
> // PTB count the individual partition and whole disk inclusions
>
>
> (you can be guinea-pig :). show_errs is no longer required for raid, I
> have come to believe.
I've rethought - I wanted to discuss this but didn't get a chance. The
above attempted under RAID to make the enbd device "show errors" instead
of blocking them when the network goes down. This allows raid to see
what is wrong.
However, nowadays enbd is capable of saying when there is something
wrong at the remote end anwyay, whether or not "show errors" is set,
so that removes one major reason to set "show errors".
If the network does go down and show errors is set, the raid device may
notice. Do we want that?
Not necessarily - it may be a temporary brownout. So I thought "might
as well remove the switch that turns on show errors under raid"
(above).
But now I see that it may be a real permanent network outage, and we
don't want requests to raid and/or enbd blocking forever when that
happens, so some kind of show errors kind of thing should be set under
raid.
I suggest that the above bits of code be left in place, but to set
ENBD_RAID_SHOW_ERRS, not ENBD_SHOW_ERRS, and that the former be used
as a kind of "partial show errors" flag. It's only effect ought to
be to make the errors visible (by disabling the device) when all of
the current daemons die at once (they will timeout if the net is down).
That should give enough time for small brownouts to be recovered
invisibly, while still letting the device unblock and error in the
event of a real net outage.
In enbd_clr_sock (which clients execute "on exit").
- if (atomic_read (&lo->flags) & ENBD_SHOW_ERRS) {
+ if (atomic_read (&lo->flags) & (ENBD_SHOW_ERRS|ENBD_RAID_SHOW_ERRS)) {
int enbd_soft_reset (struct enbd_device *lo);
// PTB we are the last client alive, diasable device as we die
atomic_clear_mask (ENBD_ENABLED, &lo->flags);
and in the BLKMDNTFY ioctl:
if (slot)
atomic_inc(&slot->md_count);
- else
- atomic_inc(&lo->md_count);
+ if (atomic_read(&lo->md_count) <= 0) {
+ atomic_inc(&lo->md_count);
+ if (!atomic_test_and_set_mask (&lo->flags, ENBD_RAID_SHOW_ERRS)) {
+ ENBD_INFO ("set raid show_errs on nd%s\n", lo->devnam);
+ }
+ }
and in the BLKMDUNTFY:
if (--md_count <= 0) {
md_notify_fn = NULL;
- if (atomic_test_and_clear_mask(&lo->flags, ENBD_RAID_SHOW_ERRS)) {
- atomic_clear_mask (ENBD_SHOW_ERRS,
&lo->flags);
- ENBD_INFO ("cleared show_errs on nd%s\n", lo->devnam);
- }
}
// PTB count the individual partition and whole disk inclusions
+ if (slot)
+ atomic_dec(&slot->md_count);
+ if (atomic_dec_and_test(&lo->md_count) ) {
+ if (atomic_test_and_clear_mask(&lo->flags, ENBD_RAID_SHOW_ERRS)) {
+ ENBD_INFO ("cleared raid show_errs on nd%s\n", lo->devnam);
+ }
+ }
This looks more sensible - it corrects some clearly wrong accounting in
BLKMDUNTFY (probably masked by lack of testing with multiple
partitioned devices in different raids).
Peter
More information about the ENBD
mailing list