[Twisted-web] adding etag and modified arguments to twisted feedparser

Selwyn McCracken selwyn.mccracken at stonebow.otago.ac.nz
Tue Sep 28 15:53:31 MDT 2004


hi,

I am having trouble modifying the twisted-based rss aggregator from the 
python cookbook so that feedparser can make use of the update related 
arguments of 'etag' and 'modified' to save bandwith.
(see http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/277099)

I realise that the problem is deferred related, but I can't seem to 
resolve the problem, even after reading the deferred documentation.

Anyway, the series of deferred functions that I think are relevant are:

1) def getPage(self, data, args): #args is the rss feed link
         return client.getPage(args,timeout=TIMEOUT)


2) def parseFeed(self, feed):
     parsed = feedparser.parse(cStringIO.StringIO(feed))

The problem is that getPage() requests the entire rss feed, and then 
passes the stream through to feedparser.parse. Normally however, 
feedparser.parse() takes furthers arguments of 'etag' and 'modified' so 
that only new feed information is returned, thereby saving bandwidth.

I tried modifying getPage() to return feedparser.parse(args), and 
removing the need for parseFeed(), but it runs substantially slower than 
the original method, I presume in a synchronous manner.

Any assistance in helping to restore the impressive parallel downloading 
performance, but with the the datetime arguments included, would be 
greatly appreciated.

many thanks,
Selwyn







More information about the Twisted-web mailing list