[Twisted-Python] Breaking up long computations

Andrew Bennetts andrew-twisted at puzzling.org
Wed Aug 6 03:50:27 MDT 2003


On Wed, Aug 06, 2003 at 10:55:49AM +0200, Nicola Larosa wrote:
[...]
> To be able to make one operation per event, I break the loops using iter()
> and reactor.callLater(), and end up with this, actually working, script:
> 
> 
> from twisted.internet import reactor
> 
> def computeRow(sqrSum, rowIter, dataIter):
>     try:
>         num = rowIter.next()
>     except StopIteration:
>         reactor.callLater(0, computeData, sqrSum, dataIter)
>     else:
>         sqrSum += num*num
>         reactor.callLater(0, computeRow, sqrSum, rowIter, dataIter)
> 
> def computeData(sqrSum, dataIter):
>     try:
>         row = dataIter.next()
>     except StopIteration:
>         print sqrSum
>         reactor.stop()
>     else:
>         rowIter = iter(row)
>         reactor.callLater(0, computeRow, sqrSum, rowIter, dataIter)
> 
> data = ((1, 1, 1),
>         (2, 2, 2),
>         (3, 3, 3))
> sqrSum = 0
> dataIter = iter(data)
> computeData(sqrSum, dataIter)
> reactor.run()

Note that both computeRow and computeData return immediately.  (Well,
computeRow does a tiny amount of work, but no matter how big your data
matrix is, computeRow will always do a small but constant amount per call,
which is the main thing)

> Apparently, this allows Twisted to process any other events that may happen
> in the middle of the computation. But I wonder, since I'm not using threads,
> and Twisted runs the reactor and my code in the same process/thread, how
> could such events get inserted into the queue?

Precisely because your functions return control to the reactor repeatedly,
rather than doing it all in one hit, there is now room for the reactor to
process any other events that need processing, if any.

> I mean, if I'm appending non-delaying events from inside the computation,
> one after the other, how can anything else get a chance to get started
> *before* the computation completes?

The key here is that reactor.callLater(0, func) will *always* cause the
function to be called on the next iteration of the reactor, even though it
can be considered already due, so that other events get a chance to happen.

E.g., consider a network server that is doing the above calculation in
addition to serving some network clients.  Network events can happen at any
time, they are caused by things completely external to your process, so it's
entirely possible that some data might arrive while computeRow is doing its
subset of the calculation.  Then, when computeRow returns, and the reactor
will do something like this:
    - check that there are no more scheduled events that need to be run *in
      this iteration*.  It *won't* pull anything off the delayed call queue
      that wasn't there before it started processing delayed calls.  Let's
      assume there's nothing else to do here.  [I'm guessing this is the bit
      you weren't getting, but I'll tell complete story anyway, just in
      case]
    - then the reactor will finish that iteration, so it'll start a new one.
    - it'll peek at the first thing on the delayed call queue, to see how
      long until something needs to be called
    - it will then check for IO events using select (or poll, or kqueue,
      or ...).  It passes the time to the next event as the timeout for the
      select call, so if there's been no unprocessed network activity since
      the last iteration, it will block until that timeout, or there is some
      activity (which ever happens first).
    - If there's any IO events to be processed, it calls the relevant event
      handlers.  In my example above, perhaps a message from a client just
      arrived, this will lead (via a few layers of abstraction) to a
      protocol's dataReceived method being called (which in turn may call a
      'messageReceived' handler and spawn a database query, or something).
    - After the IO events have been processed, it will finally turn its
      attention back to the delayed call queue, find all of them that are
      now due, and run those.
    - And then the next reactor iteration will happen.

So, every time you call reactor.callLater(0, func) and return back to the
reactor, rather than doing the work immediately, you're giving the reactor a
chance to breath and catch up on any other events that have been happening
-- but it will get back to your work as soon as it has.  If there's nothing
else happening, then your task will be getting 100% of the CPU time, minus the
overhead of jumping back-and-forth between the reactor and your functions
periodically to make sure nothing else needs Twisted's attention.

I hope I've made things clearer, rather than just confusing them :)

-Andrew.





More information about the Twisted-Python mailing list