LinuxFailSafe news
Hergott, Jean-Philippe
Jean-Philippe.Hergott@compaq.com
Wed, 5 Jul 2000 09:45:02 +0200
Hi all,
I was reading with great interest mails concerning kimberlite-piranha
projects. I've got concern with the shared bus technology, and was asking
myself about the stonith method. What is it ? ok I found answer at
linux-ha.org/stonith.html and at the present time I'm a little bit
disappointed because if I understand well ( english is not my mother
language ) to share a bus in kimberlite solution you need to 'shot the other
node in the head' so you only have one active node at the time. So it seems
to me that is a shared bus only for the surviving node and if both of the
system are not dead ( Risk associated with STONITH paragraph ). Am I right ?
In fact talking about shared bus let me thought about a FS over SCSI, a
reserve-release scsi command implementation or a private logical lock
manager in a few blocks on a device. Nobody's working on this ?
Jean-Philippe.
-----Original Message-----
From: Brian Stevens [mailto:stevens@mclinux.com]
Sent: Thursday, June 29, 2000 9:11 PM
To: Keith Barrett
Cc: ha-linux List
Subject: Re: LinuxFailSafe news
Keith Barrett wrote:
>
> Brian Stevens wrote:
> >
> > I'd like to understand more about what you feel is missing for
> > shared scsi. Can you elaborate on why you feel driver patches are
> > needed? Is this for io fencing? If so, we use the stonith type
> > approach in kimberlite for doing this.
>
> Driver would provide a standard API for SCSI reservation, more
> internal control over sharing and failover, and a communication
> path independent of networking; likely avoiding the need for
> stonith (which is a pretty serious action).
>
> You might want to ask Alan Cox or Stephen Tweedie why this is
> better.
SCSI reserves and the internal control you speak is a possible
implementation for io fencing. Lived this with TruClusters. You
are correct in that with that approach it requires support in the
kernel. We opted for an implementation that does not require
SCSI reserves for 2 reasons:
- you end up spending alot of time making sure that the
shared storage subsystems support reserves properly. Some
multi-ported controllers don't do this. It also requires
different approaches for Fibrechannel (than SCSI). We wanted
to preserve the commodity hardware support that I love
about Linux. In doing so, Kimberlite is able to support both
SCSI and Fibrechannel today, across a wide range of storage
boxes.
- it doesn't scale for parallel applications, such as Oracle
Parallel Server. OPS requires a virtual disk layer, which
you can implement on SCSI reserves, but you need to serve
each disk partition only from one node at a time. This
ends up with the node being a bottleneck, as well as the
network it serves the data on. With kimberlite, each node
can access the same partition directly, so it scales very
well. No coordination needed as OPS does this with its
internal DLM. Only caveat is making sure that the kernel
doesn't cache for raw devices. It doesn't w/ Digital UNIX,
but I don't know the buffer cache model w/ Linux.
A communication path other than the network is important, but you don't
need kernel support for that (say you aren't going down the SCSI target
mode path!). Kimberlite does this today on SCSI and Fibrechannel, no
kernel changes needed. It is integral to its implementation of quorum.
Check it out, I think its going to answer alot of questions for you.
Stonith isn't a serious course of action. You almost never need to
pull the reset line. Only in cases such as when a node hangs, and it
was running a service. Not many people I know would have issues with
repowering a hung node, it is actually advantageous in that it
in many cases it automatically brings the node back into service.
Brian
----------------------------------------------------------------------------
--
Linux HA Web Site:
http://linux-ha.org/
Linux HA HOWTO:
http://metalab.unc.edu/pub/Linux/ALPHA/linux-ha/High-Availability-HOWTO.html
----------------------------------------------------------------------------
--