[FRPythoneers] ANNOUCE: TCMQS -- Cluster message distribution.

Sean Reifschneider jafo at tummy.com
Sun Jan 7 01:26:38 MST 2001


We decided to spend new-years eve in a hotel room in Billings releasing
software.  My hard drive died, but not before I could get the Tummy
Cluster Message Queue System released (though not more than moments
after, either).  Having a remote CVS repository is a wonderful thing.

Anyway, TCMQS is an implementation of a cluster messaging system.
It is meant to be an enabling technology for writing clustered
applications.  Messages sent with a destination of the cluster are
guaranteed to be delivered exactly once to every machine.  If a
machine is down, messages are queued until it is brought back up.

Currently, there are no applications included which use this except
for a test handler which prints and logs via syslog the messages.

Basicly, it can implement a "reliable" remote procedure call, in which
the target is a group of machines.  I have used it to prototype a
system which implements peer-relationship read/write clustering of
MySQL databases (though no "cluster lock" ability is present).  This
prototype was done in something under 100 lines of Python code, IIRC.

For more information, check out TCMQS at ftp://ftp.tummy.com/pub/tummy/tcmqs/

Sean
============================
Tummy Cluster Message Queue System
Sean Reifschneider, jafo-tcmqs at tummy.com
Copyright (c) 2000, Sean Reifschneider, tummy.com, ltd.  All Rights Reserved
============================================================================

WHAT IS IT
==========

TCMQS is an infrastructure for sending messages to a collection of machines.
Messages are delivered roughly in-order, exacly once to each machine in
the cluster.  While the exact ordering may vary between machines in the
cluster, TCMQS ensures that if a machine sees a message from another
machine, it has also seen all the messages that the other machine has
seen up to that time.

Effectively, it implements a sort of "remote procedure call" on a group
of machines.

Messages for downed machines are queued and delivered when the machine
returns to service.

STATUS
======

TCMQS is currently implemented as a proof of concept in the Python programming
language.  I've worked on some applications which use TCMQS, and in the
process have learned some things which have further refined the system.
Once the features settle, I'll probably work some on optimizing the
synchronization of the components and possible optimizations.  Only then
do I plan on working to implement it in a lower-level language.

GETTING STARTED
===============

Currently, TCMQS requires tcpserver and tcpclient
(http://cr.yp.to/ucspi-tcp.html) for doing the TCP/IP communication in the
cluster.  Basicly, a node of the cluster is defined by it's queue
directory and the IP address/port number of it's queue server.  Therefore,
it's definitely possible to define multiple virtual nodes on a single
physical machine (as long as the applications don't do anything on that
system which interfere with each-other).

First of all, you need the queue directory.  The "mkqueue" script will
create a directory called "queue" in the current directory.

Next you need to set up a "tcmqs.config" file.  At the least, this should
include: "HostName = '<name>'" for a virtual setup.  The node names must
be unique in the cluster, and default to the system name.

Now you need to define what your cluster looks like.  Currently, this is
a static table defined at the top of "tcmqdb.py".  For example:

	'cluster' : ( 'foo', 'bar', 'baz' ),
	'foo' : None,
	'bar' : None,
	'baz' : None,
	
defines a cluster named "cluster" which has three members.

Finally, you need to start the cluster communication daemons.  First you
start the servers.  For each node, run:

	./tcmqprocess -f tcmqs.config daemon &
	tcpserver 127.0.0.1 1234 ./tcmqserver -f tcmqs.config &

Note that the tcpserver arguments should be the IP address and port number
of the server.  With multiple servers on one physical machine, you want to
use different port numbers.  If it's running over a LAN, you need to replace
127.0.0.1 with the public IP address.

To start up the TCP clients, for each node run:

	tcpclient 127.0.0.1 1234 ./tcmqclient daemon serverHostName &

once for each server.  The ip and port is where the associated node's
"tcpserver" process is running.  "serverHostName" should be replaced with
the name which that server believes it is...

So, for a 3-node cluster you would have 3 servers running and 6 clients
running (each node runs a server and connects to both other servers).

To test the cluster, you can run:

	date | ./tcmqinject testHandler cluster

This should cause the date to be received by every node of the cluster
once and only once.  The handler for "testHandler" prints the data
received, as well as writing it to syslog.  If you are running the
cluster processes each in their own window, you should see the message
displayed on each of the windows.
-- 
 > Sorry in advance for the idiocy... (Paul)
 You don't have to give us an advance - we know you're good for it. (Mike)
Sean Reifschneider, Inimitably Superfluous <jafo at tummy.com>
tummy.com - Linux Consulting since 1995. Qmail, KRUD, Firewalls, Python




More information about the FRPythoneers mailing list