Q. about v. 2.4.15; was: Re: [ENBD] nbd with an SMP kernel?
Leonid Andreev
leonid@latte.harvard.edu
Mon, 6 Nov 2000 10:06:41 -0500 (EST)
On Mon, 6 Nov 2000, Peter T. Breuer wrote:
> "A month of sundays ago Leonid Andreev wrote:"
> > a while ago we discussed the problem where the client won't reconnect to
> > the server that has died and restarted because it keeps trying to
> > reestablish contact with the old server; i.e., the client always assumes
> > it's a temporary network out. You were going to solve this by implementing
> > a timeout after which the slaves die and the client manager restarts from
> > scratch. Has this been added to the version 2.4.15?
>
> You too? I believe I added a sensitivity to SIGPWR to the client. If
> you send it that signal it will restart the session. I don't recall
> adding a timeout. I'm nervous about adding random timeouts.
Well, sending a SIGPWR requires you to be around to send it (or to
implement some external monitoring, through heartbeat, for ex.), while a 5
min. timeout would have the whole thing recover automatically. C'mon,
let's add a timeout! You were going to -- I still have your note saying
so! :)
Or how about a configurable timeout _option_ that you can force by a
command line option, disabled by default?
-L.
>
> Do full restart
> of client after SIGPWR to enable reconnect to restarted server
> (add manpage entry).
>
>
> > Also, how serious is the SMP bug in versions < 2.4.15 and < 2.2.29 that
> > you reported on Oct. 29? In other words, do I have to upgrade? I'm using
>
> Well, you should mend the typo. If you look at the definition of
> NBD_spin_unlock_irq in the driver, you'll see it guarded by
>
> #if LINUX_VERSION_CODE >= LINUX_VERSION_2_3_30
>
> change the 2_3_30 to 2_1_0 to get the spinlocks activated in kernels
> 2.2.*.
>
> But it's not serious. It leaves the driver open in exactly one
> situation to corruption. That situation is when you try and turn
> off one device while anoyther is running.
>
> Putting the spinlock back in is relatively untested, however :-).
>
> > 2.4.14 in a semi-production environment where both the client and the
> > server are SMP boxes.
>
> I'm amazed it works, since I've had to debug SMP with nothing else but
> imagination. I've been running SMP on a uniprocessor lately, and it's
> amazingly easy to deadlock by coding errors. I believe I just noticed
> a deadlock about 5 minutes after losing the net with an nbd
> mounted (but it may have been some other aspect of SMP+net that did it).
> If you notice anything reproducible, please let me know!
>
> Peter
> _______________________________________________
> ENBD mailing list
> ENBD@lists.community.tummy.com
> http://lists.community.tummy.com/mailman/listinfo/enbd
>