[Twisted-Python] Need some pointers for writing asynchronous code for Twisted app

Fri Mar 9 18:02:30 MST 2007

Brian Costlow wrote:
[...]
> 
> So the easy way out, it seems to me, would be to make the LineRecever callback
> build the ElementTree as I get it. Then wrap minimally modfied versions of the
> code that processes the ElementTree to the dict, and the dict to the database,
> in a callInThread or deferToThread call. Which is a lot of use of the thread
> pool, which seems to violate the idea of a low-overhead asynchronous event
> loop.

Most databases don't really give you any choice but to use threads.
twisted.enterprise.adbapi helps a little.  Compared to the time it takes for the
database to do its stuff, I doubt you'll notice the thread overhead.

> So is there a better way? For example, if I have a callback chain, when the
> first one fires, do they all fire in sequence as the prior callback returns, or
> does the chain yield to other events. If it does, I could potentially break the
> code into smaller chunks, say so each one processed enough tree data to
> generate 1 dict entry, and add the chunks as a callback chain on the
> connectionLost?

Deferreds are completely independent of the reactor (i.e. event loop).  They
don't any magical yielding to the event loop or anything like that.  Deferreds
simply manage the chain of callbacks, and arrange for them to be called as soon
as the the data they're waiting on is there.

> Note: None of this code is tested, I'm just trying to get the basic logic
> worked out.
> 
> Something like this?
> 
> def connectionLost(self):
>     d = defer.Deferrred()
>     d.addCallback(chunkOne)
>     d.addCallback(chunkTwo)
>     d.addCallback(chunkThree)
>     d.addCallback(chunkN...)
>     d.addCallback(finish)
>     d.callback(self.myElementTree)
> 
> If I have a bunch of connections that close as simultaneously as the
> implementation allows, does that sequence all fire first for one closing
> connection, then the next, and so on? Or do they intermix?

They will all fire immediately.  It's just like doing:

    result = chunkOne(self.myElementTree)
    result = chunkTwo(result)
    result = chunkTwo(result)
    result = chunkThree(result)
    result = chunkN(result)
    ...
    finish(result)

i.e. synchronous.

> Or do I need to set up a chain of deferreds with explicit scheduling?
> 
> Something like:
> 
> def connectionLost(self):
>    self.myDict = {}
>    finish()
> 
> def finish(self)
>    d = defer.Deferred
>    def realFinish(d):
>          do stuff to clean up
>    d.addCallback(ChunkThree)
>    d.addCallback(realFinish)
>    reactor.callLater(0, d.callback, None)
> 
> def chunkThree()
>     d = defer.Deferred
>     def realChunkThree(self.MyElementTree, self.myDict):
>          do stuff to process one dict key
>     d.addCallback(ChunkTwo)
>     d.addCallback(realChunkThree)
>     reactor.callLater(0, d.callback, None)
>     return d

This is an awkward way to arrange it, but this would let the reactor do work
between the chunks, yes.

> The above doesn't really seem much different than the first, it's just that we
> schedule the calls explicitly, and pass data around in multiple deferreds. 
> 
> The  last thing I though about doing was something like this:
> 
> def connectionLost(self):
>     myDict = {}
>     d.defer.Deferred()
>     d.addCallback(finish)
>     myIterObj = self.myElementTree.getIterator()
>     def processChunk():
>         try:
>             foo = myIterObj.next()
>             do stuff with foo to process element to dict entry
>         except StopIteration:
>             d.callback(None)
>         except:
>             error handling stuff
>         else
>             reactor.callLater(0, processChunk)
>     return d

This approach can work a little better, yeah.

Note that returning a Deferred from connectionLost doesn't do anything.  What do
you want to wait on the deferred (i.e. what in your code is waiting on this
result)?  As far as I can tell, nothing.  If so, you probably don't want a
Deferred at all.

http://twistedmatrix.com/projects/core/documentation/examples/longex.py and
http://twistedmatrix.com/projects/core/documentation/examples/longex2.py

Have some basic examples of this sort of stuff.

There's also the "cooperator" module in
http://divmod.org/trac/wiki/DivmodEpsilon, but it's totally lacking in
documentation.  So who knows if it's really appropriate for this, or if you'll
be able to figure out how to use it.

> Except I found some really similar code in an old thread, where Bob Ippolito
> says, 'just use flow instead'
> http://twistedmatrix.com/pipermail/twisted-python/2003-July/005013.html
> 
> But the current flow doc says: Don't use flow, write asynchronous code.

In this case, twisted.internet.defer.inlineCallbacks could probably be used
instead of flow:

     @inlineCallbacks
     def doChunks():
         for chunk in chunks:
             # do the next chunk
             chunk()
             # yield to the event loop
             d = Deferred()
             reactor.callLater(0, d.callback, None)
             yield d

(The deferLater function in http://twistedmatrix.com/trac/ticket/1875 would make
this even shorter.)

Finally, are you sure you really need to chunk this processing at all?
ElementTree is pretty fast; it's entirely possible that breaking it into chunks
and going in-and-out of the event loop repeatedly will hurt your performance
more than just doing it all at once.  It might be a good idea to check if you
actually have a real performance problem (rather than just a theoretical one)
before you worry about solving it.

Similarly, consider just putting the computationally expensive stuff in a
deferToThread call and letting your OS worry about scheduling it.  If the
processing doesn't need to interact much with the event-driven code, then this
can be a good option.

-Andrew.