[FRPythoneers] Module Archiving

Sean Reifschneider jafo at tummy.com
Wed Oct 20 03:30:04 MDT 1999


So, the Distutils-SIG at:

   http://www.python.org/sigs/distutils-sig/

Has a *LOT* of background documentation.  Including requirements, 
Proposed Interface, Design Proposal, and Implementation.  This all
seems to be oriented more towards the actual building of already
downloaded packages.

One thing that it doesn't seem to address at all is the actual
process of locating and downloading modules.  As I mentioned,
sort of a "directory service".

I'm imagining something sort of like DNS from the standpoint of
having a number of distributed servers.  I'm imagining a protocol
which has an idea of "clusters" -- the public module repository would
be one member of this cluster, with the user being able to join
one or more other clusters and specify a search order.

A cluster would be defined by a list of one or more "root" directory
servers.  The primary goal of these servers is to direct clients to
other servers which actually have the content, if they don't have
the content available themselves.  The "leaf nodes" of this arrangement
would primarily have content, where the root servers may have more
"links".

Content I can see consisting of two parts: the modules themselves, and
meta-data (descriptions, documentation, and HTTP links?).

What I'm talking about is effectively an automatic FTP mirroring
system.  It would have to provide a unified name-space, and a
mechanism for location of objects in that name-space.  It would
probably also be nice to provide a mechanism for a client to
query a servers willingness to provide a module.

Perhaps a root server tells a client about 10 servers who have the
object in question.  I imagine the client then sending a UDP packet
to all 10 servers asking their willingness to provide that object.
Say that 3 of them have too many connections as it is, and another
3 are on different continents.

A server could drop query packets (if they are "full"), or delay a
response (if they're busy, but not full for example).  The nature of
the net is that far-away or poorly connected sites (viewed from the
end-to-end connection perspective) will tend to lose or be slow to
respond to these requests.  Imagine further that one of the 10
hosts above is at your ISP or even on a local cache!

At that point a client would attempt to make a TCP connection to
one or more of the servers it received a response from.  I'm imaging
something similar to HTTP where the user asks for a fully-qualified
object name, and is returned data, with the option being that multiple
objects can be requested.  The server could close the connection at
any time that it is not sending a request (presumably because of a
timeout).  The client could send any number of requests "batched".

It should probably be allowed to request sending from a specific point
in the file, or maybe even requesting cksums of blocks of an object
for determing where to request retransmission from.

For example:

   client> GET Python/Imaging/PIL-module-redhat OFFSET=123456
   client> GET Python/Networking/sockserv.py OFFSET=0
   client> QUIT
   (the above being sent in one "batch" as in SMTP pipelining)
   server> Header information including content-size of first module.
   server> [data, starting at offset 123,456]
   server> Header information including content-size of second module.
   server> [data]
   server> <closes connection>

Of course, I don't see that from the server perspective there's anything
that requires any knowledge of the objects themselves -- just of a
heirarchical structure and object information.

I proposed this sort of idea to Eric Raymond in relation to the Trove
project, but he dismissed it because it didn't use existing client tools
(FTP in particular).  There is a proposal for specifying multiple providers
of content via a URL (I found it in the Squid site, called URN or something
I seem to recall).  However, there doesn't really appear to be clients
available that handle this...

Personally, I have no love for FTP.  It works well and we have some really
good clients these days, but the data-connection idea just ends up pissing
off people who maintain firewalls.  :-)

Anyway, those are some of my ideas...  Comments?

Sean
-- 
 Please submit resumes in Word format with subject "Lead Unix Admin" to: [xxx]
 (actual job advertisement)
Sean Reifschneider, Inimitably Superfluous <jafo at tummy.com>
URL: <http://www.tummy.com/xvscan> HP-UX/Linux/FreeBSD/BSDOS scanning software.




More information about the FRPythoneers mailing list