[ENBD] 2.4.32 conclusions

Peter T. Breuer ptb at it.uc3m.es
Thu Mar 11 11:37:44 MST 2004


"Also sprach Anders Blomdell:"
> > via a second machine, but one can't. The only way out is not to use 
> > buffers. I.e.
> > open the resource O_DIRECT or use a raw device as the resource.
> I did get loackups with O_DIRECT (if that what -n on server does)

It does. That is very hard to believe. But assuming it is so, can you
go into the enbd-client.c code, and comment out the groups of lines which
go lock.down_... and lock.up_.... . there should be two lock.downs and their
corresponding lock.ups. You should end p with this:


+   /*
    if (lock.down_write_timeout(&lock, 1000 * self->data_timeout) < 0) {
        err = -ETIME;
        PERR("failed to get write lock on %Ld-%Ld, timeout\n", req->from,
                req->from + req->len);
        goto create_reply;
    } else {
        undo_lock = 1;
    }
+   */

...

  create_reply:
    
+   /*
    if (undo_lock) {
        lock.up_write(&lock);
    }
+   */

...

    init_rwlock(&lock, self->dev, req->from, req->len);
+   /*
    if (lock.down_read_timeout(&lock, 1000 * self->data_timeout) < 0) {
        err = -ETIME;
        PERR("failed to get read lock on %Ld-%Ld, timeout\n", req->from,
                req->from + req->len);
        goto create_reply;
    } else {
        undo_lock = 1;
    }
+   */

    DEBUG ("server_read reads for request len %d\n", req->len);
    err = server->read (server, buf, req->len, req->from);


create_reply:
+   /*
    if (undo_lock) {
        lock.up_read(&lock);
    }
+   */

The reason for my saying so is that on my test platform under 2.6.3 I am
seeing evidence that fcntl locks fail at the 2GB barrier.  If the
clients cannot get the lock in order to do their work, they will not do
their work.

> > Maybe you don't understand "raw device"? That's one of the /dev/rawX.
> Didn't try it.

Arne at least says that that works. It's cumbersome, though.


> > You bind those devices to existing block devices using the raw device 
> > utilities,
> > then access the raw devices instead of the block device.
> >

> > It is normal to have two servers each writing a mirror to the other server.
> > This does not hurt when both services are under light write pressure, but under
> > heavy pressure one must use O_DIRECT or raw resources in order to avoid
> > VMS cross deadlock. If VMS is not acting, it cannot deadlock.

> Never mind, load seems to be lower when not cross-writing (i.e master 
> writes to
> 2 mirrors). 6 parallell mkfs.ext on mirrors (i.e. 12 /dev/nbX) gives 50% 
> load on master
> and 10% on mirrors, as opposed to 100% load on all 3 machines when 
> cross-writing.

What does top say is the process doing the work, and what does strace
say it is doing? :-).


Peter


More information about the ENBD mailing list