[Twisted-web] html cache with timeout

Sun Jan 30 07:19:00 MST 2005

On Sun, Jan 30, 2005 at 01:56:50PM +0100, Andrea Arcangeli wrote:
> but I fixed you great hack and here we go:

Ok I already made it good enough for merging IMHO! Please don't keep
this in a branch that risks to get obsolete. This is a major useful
feature IMHO.

Index: nevow/rend.py
===================================================================

--- nevow/rend.py	(revision 1134)
+++ nevow/rend.py	(working copy)
@@ -30,6 +30,7 @@
 from nevow import flat
 from nevow.util import log
 from nevow import util
+from nevow import url
 
 import formless
 from formless import iformless
@@ -374,6 +375,7 @@
             self.children = {}
         self.children[name] = child
     
+_CACHE = {}
 
 class Page(Fragment, ConfigurableFactory, ChildLookupMixin):
     """A page is the main Nevow resource and renders a document loaded
@@ -384,12 +386,29 @@
 
     buffered = False
 
+    cacheTimeout = None # 0 means cache forever, >0 sets the seconds of caching
+    __lastCacheRendering = 0 # this should not be touched by the parent class
+
     beforeRender = None
     afterRender = None
     addSlash = None
 
     flattenFactory = flat.flattenFactory
 
+    def refreshCache(self):
+        assert self.cacheTimeout is not None
+        _now = now() # run gettimeofday only once
+        timeout = _now > self.__lastCacheRendering + self.cacheTimeout and self.cacheTimeout > 0
+        if timeout:
+            self.__lastCacheRendering = _now
+        return timeout
+    def cacheIDX(self, ctx):
+        return str(url.URL.fromContext(ctx))
+    def storeCache(self, ctx, c):
+        _CACHE[self.cacheIDX(ctx)] = c
+    def lookupCache(self, ctx):
+        return _CACHE.get(self.cacheIDX(ctx))
+
     def renderHTTP(self, ctx):
         ## XXX request is really ctx now, change the name here
         request = inevow.IRequest(ctx)
@@ -411,24 +430,27 @@
             if self.afterRender is not None:
                 self.afterRender(ctx)
 
-        if self.buffered:
+        if self.buffered or self.cacheTimeout is not None:
             io = StringIO()
             writer = io.write
             def finisher(result):
-                request.write(io.getvalue())
-                finishRequest()
-                return result
+                c = io.getvalue()
+                self.storeCache(ctx, c)
+                return c
         else:
             writer = request.write
             def finisher(result):
                 finishRequest()
                 return result
+        c = self.lookupCache(ctx)
+        if c is None or self.refreshCache():
+            doc = self.docFactory.load()
+            ctx =  WovenContext(ctx, tags.invisible[doc])
 
-        doc = self.docFactory.load()
-        ctx =  WovenContext(ctx, tags.invisible[doc])
+            return self.flattenFactory(doc, ctx, writer, finisher)
+        else:
+            return c
 
-        return self.flattenFactory(doc, ctx, writer, finisher)
-
     def rememberStuff(self, ctx):
         Fragment.rememberStuff(self, ctx)
         ctx.remember(self, inevow.IResource)
Index: nevow/vhost.py
===================================================================
--- nevow/vhost.py	(revision 1134)
+++ nevow/vhost.py	(working copy)
@@ -19,7 +19,7 @@
 """
 
     def getStyleSheet(self):
-        return self.stylesheet
+        return VirtualHostList.stylesheet
  
     def data_hostlist(self, context, data):
         return self.nvh.hosts.keys()

Only one thing I'm not sure about: I'm unsure about the meaning of the
result passed to the finisher.  Does it matter at all? Is it always ''
right? It has to be a null string, I can't see how it can't be a null
string. Otherwise we'd need to cache it too and change the patch a bit.
In my limited testing result is always '' so I didn't bother to cache
it.

You know, at >200 req per second with quite a ton of dynamic stuff
inside, I'm very relaxed now.

220 req per second means the homepage could sustain a load of 19 million
hits per day and 570million hits per month. It will be less than that,
since the completely dynamic part will still suck much cpu power, but
having the basic web going fast is a great bonus already, and clearly
there will be more traffic on the outside pages than in the inside pages.

The timeout I'm using is 10 sec, that means once every 10 sec it will
execute a synchronous rendering. But that's ok, if the load goes up too
much moving it to 60 sec will fix it.

I believe this caching scheme should stay in place and be merged, since
it's the most efficient caching possible, very suitable for pages that
changes not very frequently or that are completely static. Other caching
with more finer granularity can happen on top of this, but this is the
highest prio one IMHO. This for example fits perfectly in the "/" page
of my site and other high traffic mostly static html pages (it's not
completely static and it changes once every 10 sec, so it's still
possible to edit the xml files or to rebuild the class with stan loader).

Setting cacheTimeout <= 0 will cache the page forever, that ok for
loaders.stan unless you use rebuild.

Here you see below the only change I had to make to my site to enable
the caching in a production ready usage.

Thanks a lot Valentino^wdialtone! ;)

--- cpushare/web/redirect.py	29 Jan 2005 02:05:56 -0000	1.8
+++ cpushare/web/redirect.py	30 Jan 2005 14:06:10 -0000
@@ -21,6 +21,7 @@ class download_class(basepage_class):
 
 class redirect_http_to_https(root_basepage_class):
 	addSlash = True
+	cacheTimeout = 10
 	docFactory = loaders.xmlfile('root_page.xml', XMLDIR)
 
 	child_css = static.File('styles/cpushare.css')