[ENBD] diskless enbd-client/enbd on initrd

Peter T. Breuer ptb at inv.it.uc3m.es
Thu Apr 6 11:37:47 MDT 2006


"Also sprach Rudolph Bott:"
> But today I started working on it again: I changed the enbd-client

You should not have to work on anything.  It just works straight off for
everyone else, so you should be asking "what is unusual about my setup".

> enbd-client   595: #1927 mainloop: Client times out waiting 60s in
> mainloop. Breaking off
> enbd-client   594: #1927 mainloop: Client times out waiting 60s in
> mainloop. Breaking off

This means the server is dead. Go have a look at the server! Why did
the server die?

> ENBD #1160[0]: enbd_rollback (0): rollback req c009b7d4!
> ENBD #1160[1]: enbd_rollback (1): rollback req c009b8fc!

This means you are rolling back instead of erroring. Change.

> enbd-client   595: client (1) opened socket (4) to 172.16.20.1:1113
> enbd-client   594: client (0) opened socket (4) to 172.16.20.1:1113

This is a reconnect to a (new) server.

> enbd-client   594: client (0) read passwd ok from 172.16.20.1:1113
> enbd-client   594: client (0) got cliserv magic ok from 172.16.20.1:1113
> enbd-client   594: client (0) got a signature ok from 172.16.20.1:1113
> enbd-client   594: client (0) enters setsig
> enbd-client   594: client (0) set sig uses whole disk, wants slot 1
> ENBD #3150[0]: my_nbd_set_sig (0): failed sigcheck wth -22

The signature the server offered was NOT the one that the previous
server offered. Change.

Why are these things happening? 

  1) why does your server apparently not respond for 60s?
  2) why does it not offer the same signature as it did last time?

Anyway, remove the sigcheck from the kernel code. And please use the
two static server and client binaries that I just compiled and put up
on the ftp site ...

-rwxr-xr-x   1 root     prof       492211 Apr  6 19:21 enbd-client-2.4.33-static*
-rwxr-xr-x   1 root     prof       504994 Apr  6 19:21 enbd-server-2.4.33-static*

I haven't checked if they run, but I suppose they do.

You want to look at the KERNEL LOG. Please!


But just set my_nbd_set_sig to always return 0.

static int
my_nbd_set_sig (struct enbd_slot *slot, int *sig)
{
        int err = 0;
        int buf[ENBD_SIGLEN / sizeof (int)];
        int islot = slot->i;
        struct enbd_device *lo = slot->lo;
+       return 0;


The kernel log will show you what the exact problem was. There's no
point in guessing. All we know from userspace is that we got -EINVAL,
which is

        if (!access_ok (VERIFY_READ, (char *) sig, ENBD_SIGLEN)) {

               ENBD_ALERT ("(%d): failed sigcheck with bad user address %p\n",
                          islot, sig);
                err = -EINVAL;

which seems to say that the client passed the address of something that is 
not a valid userspace address as a buffer.

Or it could be

        if (enbd_set_pid(slot) < 0) {
                up(&lo->pid_sem);
                return -EINVAL;
        }

which is

        ENBD_ALERT ("(%d): live process %d is trying to set sig\n",
          islot, slot->pid);
        return -EINVAL;

which means essentially that the slot is still owned by a live client
that is not the current client and 60s have not yet passed since the
supposed owner of the slot last talked to us to tell us it is alive.

So it might be that you killed the old client and then rapidly started
a new client, without waiting for the old client to time out in the
kernel.  I guess that might happen if the old client was killed with
SIGKILL instead of SIGTERM.

Or maybe the -EINVAL comes from

      err = enbd_set_sig(lo, (int *)buf);

...

     if (memcmp(buf, (char*)&lo->signature[0], ENBD_SIGLEN) != 0) {
                return -EINVAL;
        }

which is simply a non-matching signature (I think I am going to change
some of these error values so I don't have to keep hitting you with the
cluestick that says READ THE KERNEL LOG!  - you are SOOOO reluctant to
do so).



> Apparently it only happens when the machine is really idle (e.g. after
> some time after boot up etc.). If you keep it busy with something (file

If the machine is idle, it sends deadman pulses.

You have loaded ENBD_IOCTL.KO, haven't you?

!!!!!

Peter


More information about the ENBD mailing list