multiple SCSI hosts
Matthew O'Keefe
okeefe@lcse.umn.edu
Mon, 29 Mar 1999 07:51:43 -0600 (CST)
A few comments on Steve and Jame's comments about SCSI and HA:
Fibre Channel is the disk drive industry's answer to most of the
problems mentioned with parallel SCSI. It is a fast, scalable,
network interface that fixes all of the physical interface issues.
Each Fibre Channel port can connect directly to 100s to 1000s of devices;
it is a Gigabit serial interface using either optical or coax
and can extend 10s of kilometers; it has a smart switch framework
called a "Fabric" that can effectively isolate hosts that have gone
mad or are just in a state of constant stupidity (for example, NT
insists on re-formatting any drive it can see on a SCSI bus if it doesn't
have a Microsoft NT disk label embedded on it).
>
> An important issue, for any complex multi-machine SCSI configuration, is that
> it tends to get flaky, due to grounding problems. This not only causes data
> errors - they can actually be bad enough to blow up the hardware.
Goes away with FC.
> Modern SCSI variants use very flimsy connectors, which are quite trouble
> prone. SCSI is a parallel bus, so it has a lot of connections, and requires
> every one to be reliable.
FC is serial that uses only point-to-point connections: there is no
physical bus..
> It has no fault tolerance, and only simple parity
> (which is often not even implemented) as a fault detection mechanism. Taking
> SCSI cables and connectors outside a box, to connect to another box, is
> asking for trouble.
Goes away with FC, which has very sophisticated error checking mechanisms..
>Its fine in a test environment. In the real world
> anything remotely fragile gets damaged, even in a fairly well controlled
> room, or a rack. I only ever feel comfortable with SCSI when its all tidily
> in one box, with one power supply. One processor box connected to one
> adjacent RAID box, tightly cabled together, and plugged into the same power
> outlet is about the greatest risk I want to take with distributed SCSI. I've
> never seen anything more complex be 100% reliable.
>
> So, spreading a SCSI bus around a number of boxes may have more potential for
> reducing availability than increasing it.
Again, Fibre Channel basically gets rid of all these problems, but of
course brings a few problems of its own :-( There is a tendency in
current FC chip sets to do the equivalent of a SCSI BUS RESET --
in FC it is called a LIP (Loop Initialization Protocol) -- whenever
"problems" occur, for example new devices or adapters using
different chip sets appearing on the bus. This problem should go
away as FC matures, and can be avoided altogether if you have the
$$$ for a FC fabric.
Regarding heartbeats: IP heartbeats are great since they are integrated
with the host naming environment, and they can help you determine if
hosts are down or if the network is down (especailly if hosts are
connected with both FC/SCSI and Ethernet). Our group has been
working with the industry to define a SCSI lock protocol
(called Device Locks) which includes a feature which could be
used for heartbeating hosts.
Each device lock is a multiple readers/single writer lock
that sits on a device (each device can have thousands or millions of
these locks). A lock held by a host must be tickled by the host
every X seconds, where X is defined in the mode page. By assigning
each host a specific lock for hearbeating, and by regularly
reading the expired lock bitmap using a DLOCK command, it is
possible to determine which clients can connect to a device and
appear to be operational.
Device Locks are about 6 months away from being standardized in SCSI-3.
For more information about them get the spec from the following
web page: http://gfs.lcse.umn.edu. Currently Seagate FC drives
and a few RAID vendors have implemented DLOCKs. They should be more
widely available once the command makes it through the formal SCSI
standardization process.
Matt
Matthew T. O'Keefe okeefe@ece.umn.edu (612) 625-6306
Director Pretty Cool Software Laboratory
University of Minnesota FAX: (612) 625-4583
Minneapolis, MN 55455 WWW: http://www.lcse.umn.edu/~okeefe
>
> "If you want to know about fault tolerant systems ask Microsoft. I know of
> nobody else that tolerates quite so many faults in the systems!"
>
> Steve
>
>
>
> James O'Kane wrote:
>
> > Hi,
> > I've been thinking about the shared SCSI bus question, and I have
> > some ideas that I would like to bounce off some people before I spend too
> > many hour researching an empty dream. I've done a little reading on the
> > archives, but I don't think I'm using the right search words.
> >
> > I'm curious why we should restrict ourselves to current SCSI
> > hardware that is available? What about designing a special card that does
> > what we want? I'll admit up front that I am no where near familiar with
> > the current scsi specs, but I would guess that we could add control codes
> > to the set as long as we don't overlap in namespace.
> >
> > The card that I have in mind would work just like any other SCSI
> > card when by itself, but when you configure it in a network of machines,
> > each SCSI card could 'ping' the other scsi cards on the bus, and tell if
> > they are still alive. My first thoughts on how control is done would be by
> > SCSI ID. The lower your number the more authority you have.
> >
> > I read that a problem is when a card comes back on-line it
> > reinitalizes the drives, but with a card designed for this purpose, it
> > could be made to detect if the bus is already active.
> >
> > I have some more ideas about this, but they start getting into
> > details of implimentation. I'm hoping that I get one of a few replies.
> > Either someone can point me to a link of someone who is already doing
> > this. Some one could point out why this idea is flawed. Or someone could
> > point me to some links for where I could start researching more.
> >
> > thanks
> > -james
>