[Twisted-Python] freeing the reactor to do other jobs

Fri Nov 7 10:59:52 MST 2008

On Fri, Nov 7, 2008 at 10:26 AM, David Ripton <dripton at ripton.net> wrote:
> On 2008.11.07 08:29:50 -0500, Jeff Dyke wrote:
>> I'm using the XMLRPC server in twisted and a few methods call other,
>> sometimes long running, functions/methods.  I'm trying to get my brain
>> around how to free the reactor to respond to other requests while this
>> is happening.
>
> There are two ways:
>
> 1. Instead, write functions that don't block for long, but instead do a
> little bit of work, schedule a call to do the rest of the work, and then
> return, so the reactor can have the CPU back.
>
> 2. Farm out big chunks of work that you can't or don't want to split up
> to a subprocess or thread.
>
>> A scenario.  A call is made to the server, which selects say 10K rows
>> from a db and needs to check each row against a table and if they do
>> not exist, insert them.
>>
>> """ Oversimplified version of the process """
>> def getData(self,user_id):
>>     rows = self.getUserData(user_id)
>>     for row in rows:
>>         if self.existsInQueue(row['some_id']):
>>             continue
>>         else:
>>              self.insertQueue(row)
>
> If the long-running work is in a blocking database call, and the
> database does not support a less-blocking version and you can't change
> the database, then you probably want to use deferToThread for that part.
>
> And then move the rest of getData into a separate function, that gets
> called in a callback after getUserData finishes.
>
> def getData(self,user_id):
>    deferred1 = reactor.deferToThread(self.getUserData, user_id)
>    deferred1.addCallback(self._addRowsToQueue)
>    deferred1.addErrback(self._getUserDataFailed)
>
> If adding the rows to the queue is fast, then you're done.  Just move
> everything after getUserData into _addRowsToQueue.
>
> def _addRowsToQueue(self, rows):
>    for row in rows:
>        if not self.existsInQueue(row['some_id']):
>            self.insertQueue(row)
>
> But if adding all the rows to the queue in one function call is too
> slow, then you need to split it up.  It's a loop, so splitting it up is
> easy.  Here's the simple scheduling-only version:
>
> def _addSomeRowsToQueue(self, rows):
>    if rows:
>        row = rows.pop(0)
>        if not self.existsInQueue(row['some_id']):
>            self.insertQueue(row)
>        reactor.callLater(0, self._addSomeRowsToQueue, rows)
>
> Adding deferreds to the mix so that a callback function is called when
> all the rows are added to the queue is the next step, after you
> understand how this much works.
>
>> I want the caller to wait on a result from this process, but I also
>> want the reactor to be able to handle other requests as they come in.
>
> I hope you can live with "I want something to happen using the results
> from this process", rather than "I want the caller to wait on a result
> from this process."
>
> You can simulate blocking flow somewhat with deferredGenerator or
> inlineCallbacks, but I recommend sticking to the old way at first.
> It's simpler and less magical.
>
> --
> David Ripton    dripton at ripton.net


Thanks all for the input... this will be a great help.
>
> _______________________________________________
> Twisted-Python mailing list
> Twisted-Python at twistedmatrix.com
> http://twistedmatrix.com/cgi-bin/mailman/listinfo/twisted-python
>