[Twisted-Python] Understanding the role of .tap files

Fri Aug 13 13:57:19 EDT 2004

On Thu, 2004-08-12 at 21:58, Abe Fettig wrote:

> I'm looking for help understanding how .tap files are meant to be used 
> in production.

Abe,

Thanks for asking :).  This is an issue that there has been very little
conclusive discussion of, but lots of strong opinions on.  I will try to
give you a little background on how we got where we are now, and where
we hope to be eventually.  Unfortunately this probably won't answer your
question directly, but I hope it will help.

The idea behind tap files is that you have an object which represents
your application's configuration, and the storage format of that object
is back-end independent.  The most obvious application of this is "your
configuration can live in a database", but that's really just the tip of
the iceberg.

The difficulty with tap files is that there is a significant gap between
theory and practice, because it's very difficult to get the prospective
users (system administrators, particularly large-scale system
administrators) to provide comment on what would be the most useful
setup for themselves.  The system administration community is, almost by
definition, fond of the way in which they already know how to do
things.  There is a windows sysadmin culture, but since pretty much all
serious servers are run on UNIX these days, UNIX admins are the ones I
care about, and they like the nebulously-defined concept of "config
files".

In my experience, "config files" are really just extremely inconsistent,
limited, often buggy and slow programming languages with a huge implicit
import mechanism bizarre and arbitrary quoting rules.  Inevitably they
begin as a simple association of semantically significant keys with
values and eventually mutate into a tree-structured hierarchy with some
kind of limited control structures.  Depending on the nature of the
application domain for the particular server, the evolution can happen
slower or faster, but it seems inevitable for applications run on any
kind of scale.  The evolution tends to slow down as you reach about 80%
of the functionality of a real programming language, because sysadmins
don't want to be developers.  I'm certainly not saying they should be;
lawyers and novelists may speak the same language, but they hardly
participate in the same activities.

Obviously I'm not a big fan of config files.  The biggest problem I have
with config files is that knowledge does not migrate - if you have
learned the bizarre internals of Apache configuration, this will in no
way prepare you for the configuration of ANY mail system, let alone
sendmail.  There is also no way to hook apache up to any kind of mail
system by specifying it in the configuration language, because you need
a configuration directive to support "mail service".  Abruptly, you must
transition to a "real" programming language, compile and deploy
functionality, since the facility to do ad-hoc configuration is too
limited.

This makes the configuration and deployment of webmail systems a really
hairy problem... a problem which I have spent a good deal of time and
grief to solve :). [0]

It is probably an unstated goal of the Twisted project to never
officially support only one configuration mechanism.  In some
configurations, you really need the full power of Python; that's what
the twistd -y configuration option is for.

The 'mktap' utility distributed with Twisted is, unfortunately, not a
serious application deployment tool.  It was intended as a stopgap
measure because twisted.coil was taking too long to implement.

twisted.coil, briefly, was designed to emit tapfiles by creating a new,
"blank" server, adding servers, web resources, IRC bots and the like to
it in real time using a web-based management interface, and then saving
the result to a file.  Both the complexity and flexibility of the UI and
the stability of the persistence mechanism were problems that made it
very difficult to implement in a way which was actually usable in
production, but we have had a few usable prototypes developed.

mktap is only really useful if your application has an initial
configuration step to get the initial UI going and then several
subsequent configuration steps that involve changing the state of your
server and then saving it away, usually to a database or a more stable
storage that doesn't look anything like a config file (or a tap file) at
all.  This has not become a common idiom in Twisted code, so mktap
remains a good deal less useful than it could be.

Coil still hasn't been successfully implemented, so given mktap's
shortcomings, most serious applications end up providing some
specialized modules for configuration and then using a .tac file (really
a .py file with a few special rules about its namespace) to provide
configuration.  For example, quotient SVN trunk has abandoned mktap
entirely and is now using a module called 'quotient.deployment' so a
typical configuration ends up looking like this:

### 

from quotient.deployment import deploy

application = deploy(dataDirectory = 'data/db',
                     fileDirectory = 'data/files',
                     webLogFile = 'web.log',
                     domainNames = ['localhost', '127.0.0.1'],
                     pop3Port = 20110,
                     pop3sPort = 20995,
                     imap4Port = 20143,
                     imap4sPort = 20993,
                     httpPort = 20080,
                     httpsPort = 20443,
		     certificateFile = 'server.pem',
		     privateKeyFile = 'server.pem',
                     smtpPort = 20025,
                     smtpsPort = 20465,
                     sipPort = 25060,
		     pbPort = 8787,
                     manholeUser = 'admin',
                     manholePassword = 'admin',
                     manholePort = 'tcp:19293:interface=localhost')

###

So, that's how we're doing it at Divmod, and it works pretty well for
us, but I feel that it's a failure of Twisted as a framework that things
have to work this way.  In particular, I'd like to be able to separate
Quotient's application from the initialization of the web server, and
connect a web resource published by the quotient application to a web
publisher, which might live on an SSL-enabled socket, on a TCP port, or
on a UNIX socket, or even over some crazy UDP proxying thing.  Right now
we've glued the concrete TCP and SSL sockets to the web server straight
to the application, because anything else simply would have been too
complex and since our application speaks nearly every protocol Twisted
is capable of, we don't care too much about configuration integration
points with third-party software yet.  This *will* become a serious
problem when we want to integrate third-party plugins and configure
them, though, because that configuration will want to live in the same
area and there is no facility for that.  Currently the way to configure
a plugin is to make a *new* deployment module and write a derivative
function there for your configuration, which only allows one
sub-application to run at a time.

>From the experiences I've heard from other Twisted developers though,
it's far from clear how to make progress beyond this point without
making far more policy decisions than we're comfortable with.  Twisted
is a framework that tends to get called in only in the cases where you
have hard problems (after all, if you have easy problems, why not use an
eminently adequate and much more widely deployed solution like Apache or
Sendmail?) and hard problems have diverse requirements.  Many twisted
applications have very different ideas about how to, for example,
connect to a database, or even what a database *is*.  It's not clear to
me that we can say that any of these approaches is definitely
wrongheaded or invalid and needs to be set aside by the framework, but
with these differing ideas, there aren't a lot of opportunities for
integration that would inspire a general integration system.

For the moment, I am approaching this problem from the opposite end.  I
think that when we[1] refactor Quotient[2] into Mantissa[3], it will
evolve into a more generic application deployment platform with its own
plain-python configuration subsystem that is some simple icing on top of
.tac files, and when we have had some experience in the field with how
that works as a configuration system, we will backport some of that
knowledge into Twisted at a more general level.

Until then, deployment in Twisted can be a confusing question, but I
would suggest you follow this general strategy: avoid designing a
configuration language, and try to consider ways to simplify Python down
to the absolute bare essentials to say what your application needs to
say about its deployment environment, then use twistd -y or the
equivalent.

If you have some success figuring out how to do this well, please let us
know so we can try to develop some common patterns from our approaches
and put them into the framework.

My thinking right now runs along the lines of completely new and
different features for Twisted, such as the ability to locate singleton
services via the reactor which are connected to an event source
somehow... but I don't think those would help you very much at this
juncture.

-- Footnotes:
0: plug - check out my work so far: http://www.divmod.org/
1: e.g. my company, Divmod
2: our current application, a webmail system
3: our working title for the application server platform that encodes
our policy decisions as well as our implementation mechanisms, such as
our object database and web templating framework

-- 
  _  \ Glyph Lefkowitz   |"Songs that the Hyades shall sing,
 / \  \ glyph at divmod.com | Where flap the tatters of the King,
 ` _o_ \-----------------+ Must die unheard in, Dim Carcosa."
  ( ._\ \ - Cassilda's Song, "The King in Yellow", Act 1, Scene 2