[Twisted-Python] Re-working a synchronous iterator to use Twisted
terry at jon.es
Tue Jun 17 11:41:22 EDT 2008
I'm trying to rework some synchronous code to use Twisted and I've hit an
interesting case. The following code is my simplification of the situation.
I've not run it, but will happily flesh it out if people want this to be
First, a general helper function for use below:
# process a single result.
I have the following simple synchronous function and code that calls it:
# SYNCHRONOUS result producer
done = False
offset = 0
while not done:
results, done, offset = fetchPageViaSynchronousHttp(uri, offset)
for result in results:
# SYNCHRONOUS calling
for result in getResult(uri):
I.e., there are an indeterminate number of results available out there via
some web page. The iterator above periodically goes to get more, and yields
the new batch one by one to the calling code which processes them one by
one. This has the advantage that process() can be called as soon as any
results are available.
So how to do the above (more or less) in Twisted? Here's one approach:
# Helper func: get results from page, return them in a list
# Helper func: see if there are more results, given page. return bool.
# ASYNCHRONOUS result producer
def getResults(uri, offset=0):
def parsePage(page, offset):
results = getResults(page)
d = getResults(uri, offset + len(results))
d = None
return results, d
def returnTheseResults(page, offset):
resultIterator, done, offset = parsePage(page, offset)
return resultIterator, done
from twisted.web import client
return client.getPage(uri).addCallback(returnTheseResults, offset)
# ASYNCHRONOUS calling
def cb(resultIterator, deferred):
for result in resultIterator:
if deferred is not None:
I'm fairly sure something like this can be made to work. The idea is to
have getResults call a callback that takes a current set of results and a
new deferred that will be called with the next set of results and a new
deferred that will be called with the next set of results and...... so on
until the deferred that comes back is None, at which point you're done
(i.e., there are no more results).
The result is that process() gets called asynchronously with incoming
results, and, as with the synchronous approach, we don't have to wait until
the whole result set is in before we can begin processing results.
But this approach is definitely not simple (or at least it's not, given my
beginner-level of sophistication producing and consuming deferreds). Note
too that you don't actually use the deferred coming back from processResults.
I wanted to have the fun of thinking about this and writing my own
pseudo-solution before I posted here, but I imagine that people working
with Twisted must have many times dealt with something like the above. How
would you handle it? Other approaches are also possible, but I'll stop for
now to see what people say.
Let me know if you want the above fleshed out to working code.
More information about the Twisted-Python