[Twisted-web] Re: Setting up a project.

David Bolen db3l.net at gmail.com
Wed Aug 6 12:19:39 EDT 2008


"Govind Salinas" <blix at sophiasuchtig.com> writes:

> So my question is, how should I compose the project.  The how-tos and
> examples left me not sure of what I should do.  I have something that
> runs and can serve up mako templates, but that is all it can do since
> I am doing everything manually.  I would prefer not to have to code in
> serving things like css pages, 404 pages etc.  I am sure this is
> already in twisted.web somewhere.

>From the below is sounds like you're actually pretty well along the
appropriate approach with twisted.web.

Twisted.web is a fairly low level package (which to me, has always
been a point in its favor when I've used it).  There isn't a ton of
advanced functionality (like in a high level framework like django, or
turbogears), aside from some helper objects that can handle items like
file transfers or static file serving.  So you do have to do some of
what might be considered mundane work in your code.  But it sounds
like you've already identified a lot of that.

> Let me give you some information about the set up I have .  Basically
> I have a python program running somewhere.  I will have a default set
> of files that serve the content a particular way.  The default set
> will contain mako templates[1], style sheets and perhaps some images.
> So I need a way to kick off the mako templates based on the URL.  The
> templates may include each other.
>
> Here is the layout
>
> content-dir/
>    __config__.py -- This is a file that currently just sets up the mapping
>                             between a regex that selects a page and a path to
>                             where that page's template is under the content-dir
>    main.html -- The html files here are really the mako templates
>    main.css
>    etc...
>
> My server class takes in a template dir, loads the config and starts a
> server.  When it gets a request it matches it against the regexes and
> runs the template.  There should be some initialization that sets up
> objects that get passed to the template (in addition to values from
> the regex parsing). This is all pretty brain-dead code.

Are you saying "brain-dead" in a negative way here, or to imply that
it's boilerplate or something you'd prefer the framework do for you?
With twisted.web, you're pretty much in charge of this sort of stuff.

I think you've pretty much hit most of the items you need to cover.

For URL traversal, it's not clear from your description if you are
using the regex mapping to also find the Resource object or just for
template lookups by the Resource objects.  Either approach is fine.
Twisted.web itself will traverse a URL by following the chain of
Resource objects and their children.  This can be much more dynamic
than a fixed URL->object mapping definition since the Resources can be
created on the fly when needed.  If your Resource tree is static, then
a fixed mapping is fine too.

Generally what I've done in the past is build up the fixed portion of
my Resource tree during server initialization.  My Resource classes
have internal definitions for their template name (as a relative path
when using directories of templates) and are just handed the template
loader during construction, which they later use for rendering.  The
template loader is given the template root directory at
initialization.  The trade-off with your regex mapping is that you
have to keep two fairly separate bits of code in sync, where I have
the template information inside the object definition, but in my case
it can also be more work to get a global view of template assignments.

In terms of your question on static files (css/images/js), I just
place those into their own sub-tree and then defined a static.File()
object to serve them.  This also has an advantage that if you place
your server behind, say, Apache, in production use, you can define
rules on the Apache side to let it directly serve the static files and
never involve your code, for some improved efficiency.

Another area that you may need to implement your own support would be
for handling authorization (twisted.web provides support for managing
sessions via cookies, but you have to decide when to render an
authentication page and then store any needed authentication
information in the session).

> I was hoping for some guidance on a better strategy for doing this.
> Any help would be appreciated.  Bonus points if it is something that
> could easily be re-used on another webserver if someone wanted to host
> this on an existing website.

I think you need to help defined what parts of the above you perceive
as needing to be "better".  I've implemented several servers (many of
which were embedded in a larger application) using the above approach
and it seems just fine to me.  As to re-usability, that's pretty much
the same question for any code you write - e.g., if you're doing
authentication support, making it reusable is pretty much the same as
making any other piece of code reusable.

If you're looking for a higher level framework approach (much more
functionality "for free"), then twisted.web itself may not be the
ideal solution for you.  In the twisted world, nevow may provide a bit
more of what you're looking for, and of course there are other high
profile high-level web frameworks like django, turbogears, pylons, etc...

But if you like having a very thin layer between your application and
the network, without much baggage or enforced behavior, then
twisted.web can be quite nice.

As a more concrete example, here are some snippets of one of my recent
web server modules (in commercial use for the past year or so).  Note
this is a straight cut 'n paste, so there's various logic in addition
to the pure web processing, and probably references to functions/objects
that won't fully be explained.

First, the main application class, shown below.  It is invoked in the
top level server code through:

    web_app = web.Application(db, options)
    reactor.listenTCP(options['web_port'], web_app)

where options is a simple dictionary of configuration options, and db
is an internal database-access object used by this application to
handle executing SQLAlchemy SQL operations in a separate thread.

The application object itself is where the structure of the web site
is established, including static trees and the linked set of Resource
objects.  I've left some additional complexity in here in that one of
the Resource objects "eats" a parameter in the URL representing a
private key handed out to clients to access files for their "job".

The application comes with a web tree (the root of which is in the
configuration option "web_root") that looks like:

   <web_root>/css
             /images
             /js
             /templates/
                       /include

The static (css,images,js) portion of the tree is mapped with
static.File while templates are referenced through genshi's
TemplateLoader.  (For mako you'd use its TemplateLookup instead).

Note the one case where a specific file (favicon.ico) in the static
images tree is also mapped into a virtual location at the root of the
web site where browsers expect it to be.  This way I can keep the
physical file with the rest of the other images.

URLs for the web application generally fall into three classes:

   http://sitename/                   - Home page
                  /{css,images,js}    - Static content
                  /approval/XXXXXX/*  - Client job access URLs

The home page is handled by a Root object, static via static.File as
mentioned above, and all other URLs are beneath an ApprovalRoot
object, installed in this application object through Resource.putChild.

          - - - - - - - - - - - - - - - - - - - - - - - - -

class Application(Site):

    def __init__(self, db, options):

        # Augment options with a template loader.  Search a local tree
        # in front of the supplied (built-in) tree.
        tmpl_path = [os.path.join(options['data_root'], 'web', 'templates'),
                     os.path.join(options['web_root'], 'templates')]

        self.options = options.copy()
        self.options['loader'] = TemplateLoader(tmpl_path, auto_reload=True)
        print 'Web Loader searching:', tmpl_path

        # Main site URL entry points
        self.root = Root(self.options)
        msg_root = ApprovalRoot(db, self.options)
        self.root.putChild('approval', msg_root)

        # Entry points only permitted through a message key URL
        msg_root.putChild('', JobView(db, self.options))
        msg_root.putChild('thumb', JobThumb(db, self.options))
        msg_root.putChild('play', JobFile(db, self.options, download=False))
        msg_root.putChild('download', JobFile(db, self.options, download=True))
        msg_root.putChild('viewer', JobFileViewer(db, self.options))
        msg_root.putChild('archive', JobArchive(db, self.options))
    
        # Configure static support files
        for curdir in ['css', 'images', 'js']:
            static_dir = os.path.join(self.options['web_root'], curdir)
            self.root.putChild(curdir, FileWithoutDir(static_dir))
            # For images, also add root level access to the favicon file
            if curdir == 'images':
                favicon = static.File(os.path.join(static_dir, 'favicon.ico'),
                                      defaultType='image/vnd.microsoft.icon')
                self.root.putChild('favicon.ico', favicon)

        # And provide static access to a portion of the data tree for
        # homepage files (such as reels) if the location exists
        static_file_dir = os.path.join(self.options['data_root'],
                                       'web', 'static')
        if os.path.exists(static_file_dir):
            self.root.putChild('static', FileWithoutDir(static_file_dir))

        # In production (frozen), don't expose tracebacks
        if self.options.get('frozen'):
            self.displayTracebacks = False

        Site.__init__(self, self.root)

          - - - - - - - - - - - - - - - - - - - - - - - - -

The "FileWithoutDir" I use in some places is just a simple static.File
subclass that prevents directory listings, ala:

          - - - - - - - - - - - - - - - - - - - - - - - - -

class FileWithoutDir(static.File):
    """Acts just like static.File but won't return directory listings"""

    def directoryListing(self):
        e = ErrorPage(http.FORBIDDEN, 'Forbidden',
                      'Access is not permitted to this resource.')
        return e

          - - - - - - - - - - - - - - - - - - - - - - - - -

Here's the handler for the "approval" segment of the URL - it uses the
next segment of the URL as the client job key.  If a valid key in the
database, it then permits continued processing of the URL using its
children Resource objects.

This is somewhat specialized processing that I haven't had use for in
most of my servers, but it shows one approach to taking control of
dynamic URL traversal in twisted.  I terminate the normal URL
traversal by defining the object as a leaf, but then re-use Twisted's
own traversal mechanism on a different Resource tree created by the
ApprovalRoot object.

FYI, the database callback _db_retrieveJobUuid (executed through
db.run) is running in a separate database thread.

          - - - - - - - - - - - - - - - - - - - - - - - - -

class ApprovalRoot(Resource):
    """Act as root of the approval tree, which is accessed from URLs in
    messages, and always include the message key as the first part of
    request.postpath.  Strips off the key, validates it, and then passes
    control on to appropriate job or file based objects depending on the
    remainder of the URL.
    
    This is almost identical to normal child lookup by non-leaf objects,
    but handled at render time since the message key validation is a
    deferred operation."""

    isLeaf = True

    def __init__(self, db, options):
        Resource.__init__(self)
        self.db = db
        self.loader = options['loader']

        # Use a separate resource as the root of the remaining URL processing
        # since the isLeaf on ourselves would defeat any child search

        self.job_root = Resource()

    def putChild(self, path, child):
        """Permit simulated children, so that the overall structure of the
        web site can still be established in a higher level function"""
        self.job_root.putChild(path, child)

    def _db_retrieveJobUuid(self, key):
        sql = sa.select([schema.jobs.c.uuid, schema.messages.c.expiration],
                        sa.and_(schema.jobs.c.uuid ==
                                schema.messages.c.job_uuid,
                                schema.messages.c.key == key))

        r = sql.execute().fetchone()

        if not r:
            raise _Unavailable
        elif (r.expiration and r.expiration < datetime.utcnow()):
            raise NoResource('The email approval key has expired')
        else:
            return r.uuid

    def _cb_render(self, job_uuid, request):
        # Transfer control to the appropriate child for rendering.  In the
        # case of a top level render, modify the postpath to include the job
        # uuid as an argument.
        
        if request.postpath and not request.postpath[0]:
            request.postpath.append(job_uuid.hex)
        child = getChildForRequest(self.job_root, request)
        r = child.render(request)
        if r != NOT_DONE_YET:
            request.write(r)
            request.finish()

    def _cb_render_err(self, failure, request):
        if failure.check(NoResource):
            request.write(failure.value.render(request))
            request.finish()
            return

        return failure

    def _finishRequest(self, value, request):
        request.finish()
        return value

    def render(self, request):
        if len(request.postpath) < 1:
            return ErrorPage(http.NOT_FOUND,
                             'Missing approval reference', '').render(request)

        # We only render message key failures, so if the URL has no further
        # segments beyond the key, add a trailing "/" to trigger the child
        # lookup for the default handler.
        if len(request.postpath) == 1:
            request.redirect(request.prePathURL() + '/' +
                             request.postpath[0] + '/')
            request.finish()
        else:
            msg_key = request.postpath.pop(0)
            d = self.db.run(self._db_retrieveJobUuid, msg_key)
            d.addCallback(self._cb_render, request)
            d.addErrback(self._cb_render_err, request)
            d.addErrback(self._finishRequest, request)
            d.addErrback(log.err)
        return NOT_DONE_YET

          - - - - - - - - - - - - - - - - - - - - - - - - -

And here's a more typical Resource object - in this case something
installed beneath the approval root, but still has a very common
structure of most of my Resource objects, along the lines of:

    * Parse URL arguments
    * Retrieve database information based on arguments (deferred operation)
    * Render template (in database callback) based on information

In this particular case, the request is typically coming from an
embedded QuickTime/MediaPlayer object on a viewing window, so errors
are just logged.  In other cases, different templates are rendered on error
and/or error information is passed into a common template.

          - - - - - - - - - - - - - - - - - - - - - - - - -

class JobFileViewer(Resource):
    """Generates a viewer for a single job file (typically presented in
    a separate window).  Expects single file_uuid in the URL.

    Templates used: viewer.xhtml
    Cacheability: None
    """

    isLeaf = True

    def __init__(self, db, options):
        Resource.__init__(self)
        self.db = db
        self.loader = options['loader']

    def _db_retrieveFile(self, file_uuid):
        sql = sa.select([schema.files, schema.jobs.c.product],
                        sa.and_(schema.files.c.uuid == file_uuid,
                                schema.files.c.uuid ==
                                schema.jobs_files.c.file_uuid,
                                schema.jobs_files.c.job_uuid ==
                                schema.jobs.c.uuid))

        result = sql.execute().fetchone()
        return result

    def _cb_render(self, file_info, request):
        tmpl = self.loader.load('viewer.xhtml')
        try:
            width = int(request.args['width'][0])
            height = int(request.args['height'][0])
        except:
            width = height = 0

        context = {
            'job_name': file_info.product or 'Untitled',
            'curfile': file_info,
            'width': width,
            'height': height,
            'url': '../play/%s/%s' % (file_info.uuid.hex,
                                      urllib.quote(file_info.name)),
            'media_player': 'media_player' in request.args,
            }

        request.write(tmpl.generate(**context).render('html',
                                                      doctype=DocType.HTML))
        request.finish()

    def render(self, request):
        if request.postpath == ['quicktime.mov']:
            return ''

        try:
            file_uuid = uuid.UUID(request.postpath[0])
        except:
            return _Unavailable.render(request)

        setNonCacheable(request)
        d = self.db.run(self._db_retrieveFile, file_uuid)
        d.addCallback(self._cb_render, request)
        d.addErrback(log.err)
        return NOT_DONE_YET

          - - - - - - - - - - - - - - - - - - - - - - - - -

The setNonCacheable call above is an example where twisted.web has no
real higher level support for stuff like caching, so whereas some high
level frameworks might have a decorator or simpler way to control
caching, you handle it more directly in twisted.web.  So here's what I
am using:

          - - - - - - - - - - - - - - - - - - - - - - - - -

def setNonCacheable(request):
    """Sets headers on a request to fully disable any caching"""
    
    # Ensure we're expired by setting time in the past
    request.setHeader('Expires', 'Fri, 25 Nov 1966 08:22:00 EST')
    # HTTP/1.0 no-cache header
    request.setHeader('Pragma', 'no-cache')
    # HTTP/1.1 no-cache headers (pre-post are IE extended)
    request.setHeader('Cache-Control',
                      'no-store, no-cache, must-revalidate, '
                      'post-check=0, pre-check=0')

          - - - - - - - - - - - - - - - - - - - - - - - - -

Hope there's not too much "noise" in the code to prevent it from being
helpful.

-- David




More information about the Twisted-web mailing list