[ENBD] 2.4.32

Peter T. Breuer ptb at it.uc3m.es
Sat Mar 6 14:14:17 MST 2004


"Also sprach Peter T. Breuer:"
> Anyone want to confirm or deny that 2.4.32 is now stable and correct?

Actually I see some problems under kernel 2.6.3 - but they seem to be
userspace, not kernelspace.

In particular the client seems to get sigsegv sometimes after setting
alarm timers to 0 0 (to disable them) with setitimer. And there was a
bug in restting the new microsecond timers that excaberated that. I've
cured the bug and as a result the segfault is now infrequent, but still
there:

   seeking and
   writing....5%....10%....15%....20%....25%....30%....35%....40%....45%....50%....55%....60%....65%....70%....75%....80%....85%....90%....95%....done
   flushing buffers..enbd-client  4499: sighandler relaunches child
   from manager 
   enbd-client  4499: client (-1) reaped dead child 4534

(boom).

Here's the wooonderful strace of the event:

  gettimeofday({1078607006, 481520}, NULL) = 0
  setitimer(ITIMER_REAL, {it_interval={0, 0}, it_value={0, 29906}}, NULL) = 0

uh, that was me setting an alarm timeout of 30ms.

  rt_sigaction(SIGALRM, {0x8052820, [], SA_RESTART|0x4000000}, {0x8052820, [], SA_RESTART|0x4000000}, 8) = 0
  fcntl(4, F_SETLKW, {type=F_WRLCK, whence=SEEK_SET, start=524288, len=0}) = ? ERESTARTSYS (To be restarted)

that was the locking operation that was supposed to be guarded by the
30ms timeout. Hey, isn't 30ms a bit small? Maybe that was supposed to
be 30s :). Could be a bug.

  --- SIGALRM (Alarm clock) ---
  gettimeofday({1078607006, 511593}, NULL) = 0
  setitimer(ITIMER_REAL, {it_interval={0, 0}, it_value={55, 302116}}, NULL) = 0

sheer weirdism - the alarm went off and apparently we reset the timer
to 55s, probably for an enclosing timer loop.

  rt_sigaction(SIGALRM, {0x8052820, [], SA_RESTART|0x4000000}, {0x8052820, [], SA_RESTART|0x4000000}, 8) = 0
  setitimer(ITIMER_REAL, {it_interval={0, 0}, it_value={0, 0}}, NULL) = 0

well, here we turned the timer off, because we did whatever we were
doing.

  rt_sigaction(SIGALRM, {SIG_IGN}, {0x8052820, [], SA_RESTART|0x4000000}, 8) = 0

and we indeed say to ignore the alarm in case it ever goes off.

  --- SIGSEGV (Segmentation fault) ---

and boom. Dunno where.

I'm working on it. Only under kernel 2.6, which appears to behave
fundamentally different in some aspects that affect userspace. If
only I knew what they were ...

Just thought I'd let you know that I am now testing under 2.6.3. Why
won't my dns server run under it?

  socket(PF_INET6, SOCK_DGRAM, 0)         = 5
  bind(5, {sin_family=AF_INET6, sin6_port=htons(53), inet_pton(AF_INET6, "fe80::220:e0ff:fe8f:1c7", &sin6_addr), sin6_flowinfo=htonl(0)}}, 28) = -1 ENODEV (No such device)
  write(2, "dnsmasq: bind failed: No such de"..., 37dnsmasq: bind failed: No such device) = 37
  _exit(1)                                = ?

but ipv6 is in the kernel!

Peter






More information about the ENBD mailing list