LinuxFailSafe news

Hergott, Jean-Philippe Jean-Philippe.Hergott@compaq.com
Wed, 5 Jul 2000 09:45:02 +0200


Hi all,

I was reading with great interest mails concerning kimberlite-piranha
projects. I've got concern with the shared bus technology, and was asking
myself about the stonith method. What is it ? ok I found answer at
linux-ha.org/stonith.html and at the present time I'm a little bit
disappointed because if I understand well ( english is not my mother
language ) to share a bus in kimberlite solution you need to 'shot the other
node in the head' so you only have one active node at the time. So it seems
to me that is a shared bus only for the surviving node and if both of the
system are not dead ( Risk associated with STONITH paragraph ). Am I right ?

In fact talking about shared bus let me thought about a FS over SCSI, a
reserve-release scsi command implementation or a private logical lock
manager in a few blocks on a device. Nobody's working on this ?

Jean-Philippe.

-----Original Message-----
From: Brian Stevens [mailto:stevens@mclinux.com]
Sent: Thursday, June 29, 2000 9:11 PM
To: Keith Barrett
Cc: ha-linux List
Subject: Re: LinuxFailSafe news


Keith Barrett wrote:
> 
> Brian Stevens wrote:
> >
> > I'd like to understand more about what you feel is missing for
> > shared scsi. Can you elaborate on why you feel driver patches are
> > needed? Is this for io fencing? If so, we use the stonith type
> > approach in kimberlite for doing this.
> 
> Driver would provide a standard API for SCSI reservation, more
> internal control over sharing and failover, and a communication
> path independent of networking; likely avoiding the need for
> stonith (which is a pretty serious action).
> 
> You might want to ask Alan Cox or Stephen Tweedie why this is
> better.

SCSI reserves and the internal control you speak is a possible
implementation for io fencing. Lived this with TruClusters. You
are correct in that with that approach it requires support in the
kernel. We opted for an implementation that does not require
SCSI reserves for 2 reasons:

	- you end up spending alot of time making sure that the
	  shared storage subsystems support reserves properly. Some
	  multi-ported controllers don't do this. It also requires
	  different approaches for Fibrechannel (than SCSI). We wanted
	  to preserve the commodity hardware support that I love
	  about Linux. In doing so, Kimberlite is able to support both
	  SCSI and Fibrechannel today, across a wide range of storage
	  boxes.

	- it doesn't scale for parallel applications, such as Oracle
	  Parallel Server. OPS requires a virtual disk layer, which
	  you can implement on SCSI reserves, but you need to serve
	  each disk partition only from one node at a time. This
	  ends up with the node being a bottleneck, as well as the
	  network it serves the data on. With kimberlite, each node
	  can access the same partition directly, so it scales very
	  well. No coordination needed as OPS does this with its 
	  internal DLM. Only caveat is making sure that the kernel
	  doesn't cache for raw devices. It doesn't w/ Digital UNIX,
	  but I don't know the buffer cache model w/ Linux.

A communication path other than the network is important, but you don't
need kernel support for that (say you aren't going down the SCSI target
mode path!). Kimberlite does this today on SCSI and Fibrechannel, no
kernel changes needed. It is integral to its implementation of quorum.
Check it out, I think its going to answer alot of questions for you.

Stonith isn't a serious course of action. You almost never need to
pull the reset line. Only in cases such as when a node hangs, and it
was running a service. Not many people I know would have issues with
repowering a hung node, it is actually advantageous in that it
in many cases it automatically brings the node back into service.

Brian

----------------------------------------------------------------------------
--
Linux HA Web Site:
  http://linux-ha.org/
Linux HA HOWTO:
 
http://metalab.unc.edu/pub/Linux/ALPHA/linux-ha/High-Availability-HOWTO.html
----------------------------------------------------------------------------
--