[Twisted-web] adding etag and modified arguments to twisted feedparser

James Y Knight foom at fuhm.net
Tue Sep 28 18:10:05 MDT 2004


On Sep 28, 2004, at 5:53 PM, Selwyn McCracken wrote:
> I am having trouble modifying the twisted-based rss aggregator from 
> the python cookbook so that feedparser can make use of the update 
> related arguments of 'etag' and 'modified' to save bandwith.
> (see http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/277099)
>
> I realise that the problem is deferred related, but I can't seem to 
> resolve the problem, even after reading the deferred documentation.

Not particularly deferred related, more t.w.client related. I assume 
what's happening is that feedparser.parse() can either take a URL or a 
file-like-object. If it takes a URL, it uses its internal HTTP getting 
method, which is synchronous. Twisted's HTTP client is asynchronous, so 
you want to use that.

So what you need to know how to do is send the etag/modified 
information to Twisted's HTTP client.

You want something like:

def getPage(self, data, args): #args is the rss feed link
         return client.getPage(args,timeout=TIMEOUT, 
headers={'If-None-Match': '"xyzzy"', 'If-Modified-Since': 'Sun, 09 Sep 
2001 01:46:40 GMT'})

However, client.getPage doesn't leave you with any way to get at the 
response headers (so you can save the etag and last modified responses 
for the next request), so you'll need to use HTTPClientFactory directly 
(cribbing from the code in client.getPage). Basically, after the 
deferred fires, factory.response_headers will have the data you want, 
so you just need to keep a reference to factory around.

James




More information about the Twisted-web mailing list