[ENBD] diskless enbd-client/enbd on initrd
Peter T. Breuer
ptb at inv.it.uc3m.es
Thu Apr 6 11:37:47 MDT 2006
"Also sprach Rudolph Bott:"
> But today I started working on it again: I changed the enbd-client
You should not have to work on anything. It just works straight off for
everyone else, so you should be asking "what is unusual about my setup".
> enbd-client 595: #1927 mainloop: Client times out waiting 60s in
> mainloop. Breaking off
> enbd-client 594: #1927 mainloop: Client times out waiting 60s in
> mainloop. Breaking off
This means the server is dead. Go have a look at the server! Why did
the server die?
> ENBD #1160[0]: enbd_rollback (0): rollback req c009b7d4!
> ENBD #1160[1]: enbd_rollback (1): rollback req c009b8fc!
This means you are rolling back instead of erroring. Change.
> enbd-client 595: client (1) opened socket (4) to 172.16.20.1:1113
> enbd-client 594: client (0) opened socket (4) to 172.16.20.1:1113
This is a reconnect to a (new) server.
> enbd-client 594: client (0) read passwd ok from 172.16.20.1:1113
> enbd-client 594: client (0) got cliserv magic ok from 172.16.20.1:1113
> enbd-client 594: client (0) got a signature ok from 172.16.20.1:1113
> enbd-client 594: client (0) enters setsig
> enbd-client 594: client (0) set sig uses whole disk, wants slot 1
> ENBD #3150[0]: my_nbd_set_sig (0): failed sigcheck wth -22
The signature the server offered was NOT the one that the previous
server offered. Change.
Why are these things happening?
1) why does your server apparently not respond for 60s?
2) why does it not offer the same signature as it did last time?
Anyway, remove the sigcheck from the kernel code. And please use the
two static server and client binaries that I just compiled and put up
on the ftp site ...
-rwxr-xr-x 1 root prof 492211 Apr 6 19:21 enbd-client-2.4.33-static*
-rwxr-xr-x 1 root prof 504994 Apr 6 19:21 enbd-server-2.4.33-static*
I haven't checked if they run, but I suppose they do.
You want to look at the KERNEL LOG. Please!
But just set my_nbd_set_sig to always return 0.
static int
my_nbd_set_sig (struct enbd_slot *slot, int *sig)
{
int err = 0;
int buf[ENBD_SIGLEN / sizeof (int)];
int islot = slot->i;
struct enbd_device *lo = slot->lo;
+ return 0;
The kernel log will show you what the exact problem was. There's no
point in guessing. All we know from userspace is that we got -EINVAL,
which is
if (!access_ok (VERIFY_READ, (char *) sig, ENBD_SIGLEN)) {
ENBD_ALERT ("(%d): failed sigcheck with bad user address %p\n",
islot, sig);
err = -EINVAL;
which seems to say that the client passed the address of something that is
not a valid userspace address as a buffer.
Or it could be
if (enbd_set_pid(slot) < 0) {
up(&lo->pid_sem);
return -EINVAL;
}
which is
ENBD_ALERT ("(%d): live process %d is trying to set sig\n",
islot, slot->pid);
return -EINVAL;
which means essentially that the slot is still owned by a live client
that is not the current client and 60s have not yet passed since the
supposed owner of the slot last talked to us to tell us it is alive.
So it might be that you killed the old client and then rapidly started
a new client, without waiting for the old client to time out in the
kernel. I guess that might happen if the old client was killed with
SIGKILL instead of SIGTERM.
Or maybe the -EINVAL comes from
err = enbd_set_sig(lo, (int *)buf);
...
if (memcmp(buf, (char*)&lo->signature[0], ENBD_SIGLEN) != 0) {
return -EINVAL;
}
which is simply a non-matching signature (I think I am going to change
some of these error values so I don't have to keep hitting you with the
cluestick that says READ THE KERNEL LOG! - you are SOOOO reluctant to
do so).
> Apparently it only happens when the machine is really idle (e.g. after
> some time after boot up etc.). If you keep it busy with something (file
If the machine is idle, it sends deadman pulses.
You have loaded ENBD_IOCTL.KO, haven't you?
!!!!!
Peter
More information about the ENBD
mailing list