[Twisted-Python] Re-working a synchronous iterator to use Twisted

Sat Jun 28 16:40:11 MDT 2008

Following the massive interest in my earlier postings on this thread, I'm
following up to myself again :-)

Here's what I was trying to do:

> In case it wasn't clear before, you're pulling "results" (e.g., from a
> search engine) in off the web. Each results pages comes with an indicator
> to tell you whether there are more results. I wanted to write a function
> (see processResults below) that, when called, would call the process
> function below on each result, all done asynchronously.

I posted some cumbersome code to roughly do that.  I've since been thinking
about this on and off, with help from Esteve Fernandez, and we've made the
code quite a bit simpler.

I think there's a general pattern here that's worth thinking about.
Roughly: the above need is like the Twisted analogy of using iterators in
regular synchronous programming.

By that I mean that the normal pattern of Twisted usage is: a single event
is anticipated (by the programmer), it occurs once, and its result is
passed down a call/errback chain. That's roughly like a single function
call in synchronous code.

But if you are expecting a sequence of external events to occur and you
want to asynchronously pass their results in turn down a call/errback
chain. The need to do this in synchronous code can be filled with a simple
iterator. But doing this asynchronously (when the fetch of the next batch
of results might take a while) doesn't seem to fit easily into the
single-shot asynchronous Twisted paradigm.

I thought about modifying defer.py to allow a callback chain to be called
multiple times (and to have the "normal" single-shot chain be a special
case). But that was clearly going to get messy. BTW, I find defer.py is
really elegant.

After more thinking about how to make my previously posted code simpler,
Esteve and I came up with what you'll find at

  http://python.pastebin.com/f7df56752   (code) and
  http://python.pastebin.com/f1e582264   (simple tests)

The idea is that you provide a result fetcher function to the TwIterator
class. This function will be called repeatedly, as needed, to get more
results. It returns a deferred whose callback it should call with a list of
next results (which may be empty), a bool to indicate whether to re-call
the function, and a dict of args to pass to it next time.

The TwIterator class provides you with a list() method that you can use
almost like an iterator:

    @inlineCallbacks
    def printer(results):
        for x in results:
            print (yield x)

    fetcher.list().addCallback(printer)

This is in some sense like a general asynchronous iterator for Twisted. The
printer function receives an iterator, each element of which is a deferred,
and when that deferred fires it produces the next result.

The test code gives 4 simple example result-fetching functions, and calls
them all asynchronously. If you run it you'll see the results coming out in
a somewhat random order.

I wont go into more detail, given that no-one responded to the first two
postings. It's still possible that I'm trying to solve a problem that can
already be done by some standard Twisted module. I don't know enough about
Twisted to know for sure.

Terry