[Twisted-web] adding etag and modified arguments to twisted
feedparser
Selwyn McCracken
selwyn.mccracken at stonebow.otago.ac.nz
Tue Sep 28 15:53:31 MDT 2004
hi,
I am having trouble modifying the twisted-based rss aggregator from the
python cookbook so that feedparser can make use of the update related
arguments of 'etag' and 'modified' to save bandwith.
(see http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/277099)
I realise that the problem is deferred related, but I can't seem to
resolve the problem, even after reading the deferred documentation.
Anyway, the series of deferred functions that I think are relevant are:
1) def getPage(self, data, args): #args is the rss feed link
return client.getPage(args,timeout=TIMEOUT)
2) def parseFeed(self, feed):
parsed = feedparser.parse(cStringIO.StringIO(feed))
The problem is that getPage() requests the entire rss feed, and then
passes the stream through to feedparser.parse. Normally however,
feedparser.parse() takes furthers arguments of 'etag' and 'modified' so
that only new feed information is returned, thereby saving bandwidth.
I tried modifying getPage() to return feedparser.parse(args), and
removing the need for parseFeed(), but it runs substantially slower than
the original method, I presume in a synchronous manner.
Any assistance in helping to restore the impressive parallel downloading
performance, but with the the datetime arguments included, would be
greatly appreciated.
many thanks,
Selwyn
More information about the Twisted-web
mailing list