Q. about v. 2.4.15; was: Re: [ENBD] nbd with an SMP kernel?

Peter T. Breuer ptb@it.uc3m.es
Mon, 6 Nov 2000 01:16:40 +0100 (MET)


"A month of sundays ago Leonid Andreev wrote:"
> a while ago we discussed the problem where the client won't reconnect to
> the server that has died and restarted because it keeps trying to
> reestablish contact with the old server; i.e., the client always assumes
> it's a temporary network out. You were going to solve this by implementing
> a timeout after which the slaves die and the client manager restarts from
> scratch. Has this been added to the version 2.4.15?  

You too? I believe I added a sensitivity to SIGPWR to the client. If
you send it that signal it will restart the session.  I don't recall 
adding a timeout.  I'm nervous about adding random timeouts.

                                                  Do full restart
   of client after SIGPWR to enable reconnect to restarted server
   (add manpage entry).


> Also, how serious is the SMP bug in versions < 2.4.15 and < 2.2.29 that
> you reported on Oct. 29? In other words, do I have to upgrade? I'm using

Well, you should mend the typo. If you look at the definition of 
NBD_spin_unlock_irq in the driver, you'll see it guarded by

  #if LINUX_VERSION_CODE >= LINUX_VERSION_2_3_30

change the 2_3_30 to 2_1_0 to get the spinlocks activated in kernels
2.2.*.

But it's not serious. It leaves the driver open in exactly one 
situation to corruption. That situation is when you try and turn
off one device while anoyther is running.

Putting the spinlock back in is relatively untested, however :-).

> 2.4.14 in a semi-production environment where both the client and the
> server are SMP boxes.

I'm amazed it works, since I've had to debug SMP with nothing else but
imagination. I've been running SMP on a uniprocessor lately, and it's
amazingly easy to deadlock by coding errors. I believe I just noticed
a deadlock about 5 minutes after losing the net with an nbd
mounted (but it may have been some other aspect of SMP+net that did it).
If you notice anything reproducible, please let me know!

Peter