[Twisted-Python] Should I use asynchronous programming in my own modules?

Johann Borck johann.borck at densedata.com
Thu Oct 18 16:29:57 MDT 2007


Jürgen Strass wrote:
> [..]
>
> So I already asked myself how one would translate the example of a
> factorial function in twisted's core documentation to use the
> reactor's scheduling mechanism instead of running it in a thread. I
> think an example of how to divide it into chunks and how to use the
> reactor would be great.
>
Hi,
ok, as I understand you are *not* talking about "accepting documents
over the network" or similar, but rather about independent, long
running, CPU-bound tasks. While you could split them up into chunks
allowing the reactor to do it's work, there are a number of arguments
against doing so. If you don't need intermediate results, and don't
feed your engine incrementally with more data, the way to go is
neither 'chunking' nor using threads, but rather using
worker-processes, because pythons threads are only well-suited for
IO-bound stuff, that cannot be done asynchronously (e.g. if you have
to use some blocking db-interface because there's no alternative
asynchronous implementation) and function calls in python are
expensive. If you split up the work to do it in the reactor-thread,
and it requires heavy processing, there will be many unnecessary calls
switching between the reactor and your engine. If you instead have a
tight loop (or whatever) in a different process, those calls would be
saved, and on a multi-core or multi-processor system you could use its
additional processing power. So use IPC to communicate with worker(s),
and let them spit out the result as fast as possible, instead of
unnecessarily slowing down both your calculation and the networking
part without the option to parallelize processing.

The decisive aspect is interactivity - if you need it for your
processing 'chunking' it up is the way to go, if not, use another
process. If you don't really need to process events, but still want to
do some kind of streaming the decision is not that easy. If you need
parallel processing anyway you have no choice but to use some kind of
IPC. The safe bet is probably designing your code to be able to be
processed in chunks, and then to run it in a separate process.

I think the main misunderstanding is "[..]to use the reactor's
scheduling mechanism instead of running it in a thread."  Twisteds
reactor is not a superior multi-purpose scheduler (as JP mentioned),
but a domain-specific event handler for networking. While your
use-case might (that's my guess) profit from choosing 'chunking' over
pythons threading, it still wouldn't from choosing it over the
scheduling of your OS.

hm, did I get you right there?

Johann

> What I tried at first was programming a simple counter this way. It
> would look much similar to the code I presented in reply to Itamar
> Shtull-Trauring's answer. What I'm not sure about is if this is the
> correct way to go for.
>
> Many thanks for all the other points you've answered, it made a lot
> of things much clearer to me.
>
> Jürgen
>
>
> _______________________________________________
> Twisted-Python mailing list
> Twisted-Python at twistedmatrix.com
> http://twistedmatrix.com/cgi-bin/mailman/listinfo/twisted-python
>






More information about the Twisted-Python mailing list