[Twisted-web] Cache-friendly Nevow pages

Mary Gardiner mary-twisted at puzzling.org
Sat Aug 7 01:02:59 MDT 2004


For some reason, it might be spending most of my time sitting at the end
of a long thin piece of string connecting me to the rest of the
universe, but it might also have been watching people pound on my RSS
feeds every five minutes, I've been trying to write cache-friendly Nevow
resources.

This involves setting two HTTP headers, "Last-Modifed" and "ETag". At
the moment I'm setting both of these headers using my data source (files
in the file system). However, this has left me with a bit of a quandry
about my docFactory templates.

When my templates change, so should my Last-Modified and ETag headers.
Otherwise clients using caches will see my old templates more or less
indefinitely, at least on pages I don't subsequently change, because
their conditional GET requests complete with correct If-Modified-Since
and If-None-Match headers will tell the server never to send a fresh
copy of the data.

So, I'm faced with the problem of dating my templates or otherwise
detecting when they change and I can't think of a good way.

Some thoughts:

 1. use file timestamps on the template files

   Pros: Fits OK with the way I deal with the rest of the website data

   Cons: Reduces flexibility. I can't think of a good way to do this
   with Stan templates. I also can't think of a good way to do this
   without restarting the server when my templates change. (I do
   currently do this, but would prefer not to.)

 2. generate the ETag header based on a hash of page contents

   Pros: As best I can tell, this is how the ETag header is really meant
   to be generated, ideally it signals octect equality and should change
   if, for example, Nevow for some reason starts pretty-printing output.

   Cons: rend.Page.renderHTTP seems to make this really hard --
   even if you set the bufferedflag = True, rend.Page.afterRender
   doesn't seem to have any way to access the result of the render.
   (Correct me if I'm wrong.) Also, this doesn't help with the
   Last-Modified date, which means I'm not helping HTTP/1.0 caches very
   much, unless I store the date the hash changed somewhere.

 3. store the templates in some kind of object store and date-stamp
   them there.

   Pros: This might well let me change templates without restarting the
   server.

   Cons: It imposes a maintainence burden whereby I have to update the
   objet database with new templates. I like to have a copy of my
   website and templates on two different servers, and as best I can
   tell, no object database is going to like being copied to a remote
   server without me killing all associated processes on the remote
   server first, so there's a deployment problem.

 4. hash the template so that a changed template means a changed hash

   Pros: This is probably nearly as good as hashing the page content,
   accuracy-wise.

   Cons: I don't have any idea how to hash a DocFactory object
   effectively. Hashing the DocFactory still leaves me vulnerable to
   changes in Nevow's rendering. Hashing the DocFactory won't tell me to
   update Last-Modified unless I store the date that the DocFactory
   changed somewhere.

Anyone got any thoughts or has anyone solved this problem before? Help
with implementing 2 (how do I get the page contents in order to hash
them) or 4 (how can I hash a DocFactory object) also appreciated.

-Mary



More information about the Twisted-web mailing list