[Linux-ha-dev] Thinking about a new communications plugin
Bob Schatz
bschatz at yahoo.com
Wed Nov 24 12:43:05 MST 2010
Lars,
Please take my opinions with a grain of salt. I am just trying to share my
experiences. I am not sure if they apply here.
I appreciate all of the hard work involved in LinuxHA and Pacemaker!
Just to tell you where I am coming from while I count down the minutes before a
holiday here in the states...
In a previous life I worked at VERITAS and I was one of the original developers
of a product called VERITAS Cluster Server. From the start it supported 32
nodes. Later I developed a piece of technology called I/O Fencing which was
used to support Oracle RAC a parallel database. Our customers were generally
high end enterprise customers.
As developers, we used to obsess about how many nodes we could support and how
to speed things up etc. It was really interesting work, I got a couple of
patents out of it and my ego grew. :) I loved it.
However, I don't believe our customers really went past four nodes for many
years. I think that after 10 years customers of the parallel file system
technology did go over 4 nodes. They were the really high end customers who
cared more about performance and support and cost was not their top priority.
As developers, we would always obsess about new features, etc and our Technical
Product Managers would tell us that our customers did not want more features.
They cared more about the following:
1. Reliability
2. Patching the existing product since customers usually build a cluster for
application APP1 and they do not want to upgrade to a new version until they
build a whole new cluster with new hardware, new OS, new version of APP1, etc.
They hated when we told them that the fix for version 1.3 was in 2.0. Once the
cluster was built they did not want to touch it and when they did they had to go
through a Change Control Board to get it approved.
3. Ease of use since they wanted to be able to have a less experience system
administrator handle more clusters to reduce cost
My take away from it was the following (at least what I remember):
1. To increase reliability add less features and rewrite areas prone to bugs or
user questions. Besides the new features would not be used anyways and would
only cause customer escalations the night before I was trying to go on vacation.
:(
2. Patch the existing code as opposed to coming out with more frequent releases
3. Come up with a couple of recipes on how to do a couple of common system
administration tasks like adding a patch, migrating an application regardless of
two nodes or more than 3, etc
I am not sure how this maps to LinuxHA/Pacemaker. It may be a different market.
I thought I should share my experiences to see how it maps to what others think.
I may be off base.
Thanks for listening and for all the hard work on creating LinuxHA and
Pacemaker!
Thanks,
Bob
----- Original Message ----
From: Lars Ellenberg <lars.ellenberg at linbit.com>
To: linux-ha-dev at lists.linux-ha.org
Sent: Wed, November 24, 2010 10:23:39 AM
Subject: Re: [Linux-ha-dev] Thinking about a new communications plugin
On Wed, Nov 24, 2010 at 10:10:33AM -0800, Bob Schatz wrote:
> I am curious.
>
> What is driving the need for more than 32 nodes? Are many people doing that
>or
>
> planning on doing that?
>
> In my experience, > 80% of the people just want 2 nodes to work reliability and
>
> more than 4 nodes is just a marketing requirement to put on a glossy handout.
>
> Is that still the case or am I off base?
You are probably right.
Still there are some that want more nodes within one cluster,
either because they have a real need for it,
or even just because they like to push limits.
And, it does not need to be many hosts, the cib can grow for many
resources, many constraints, or many attributes just as well.
CTS runs with only three nodes and the "standard" auto-generated set of
CTS resources in the cib are already breaking the 64k plain text limit
so if you want to use pacemaker-1.1.4 on heartbeat, even on two nodes
you have to enable in ha.cf
compression on
compression_threshold 20 # or 30 or something
traditional_compression on
for reasons layed out already, which is, uhm, suboptimal.
And you may still hit the limit later,
depending on what exactly your cib look like.
It is just much more easy to produce a huge cib with
many nodes and a few clones ;-)
--
: Lars Ellenberg
: LINBIT | Your Way to High Availability
: DRBD/HA support and consulting http://www.linbit.com
DRBD® and LINBIT® are registered trademarks of LINBIT, Austria.
_______________________________________________________
Linux-HA-Dev: Linux-HA-Dev at lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/
More information about the Linux-HA-Dev
mailing list