[Twisted-Python] adbapi, transactions and threading

Fri Apr 4 12:09:15 EDT 2008

On Fri, 4 Apr 2008 17:01:40 +0200, Atilla <theatilla at gmail.com> wrote:
>I hope it won't be too confusing. I'm basically trying to make sure
>I've grasped the concept - do correct me on anything I am wrong about
>or my way of thinking confuses you, please.
>
>I'm not used to running blocking calls and deferToThread a lot, so I
>wanted to make sure I understand that correctly.
>
>Basically, the way I understand it - adbapi is just a deferToThread
>wrapper around the normal python API, correct? As in - if I used
>something different to access my database, for example - sqlalchemy, I
>would only need to appropriately wrap that functionality with
>deferToThread, just as adbapi does?

Yes.

>
>That's one thing. Now - what I'm trying to do in essence is to load
>some big chunk of data out of the DB, process it, and save it back.
>I'd use a nice chain of deffered calls - one runQuery, one for the
>processing, and one runOperation. However, I need transaction
>functionality, so unless I'm mistaken, my only choice is
>runInteraction.

Right.

>
>Since that's automatically ran in a separate thread, I see it as a
>monolithic piece of code -> query, processing call, saving query -> no
>deffereds can take place there. Am I wrong on that one? Even if so -
>it won't be that bad for me, since good part of the processing will be
>handled by an external library (GIL released - says so, more cores
>used, makes me happy).

Correct.

>
>That leads me to the next question - when I've got long code to run
>and it happens to use an external, thread-safe, C library, releasing
>the GIL, I should probably always take care to defer it to thread, if
>I wanted to take advantage of multiple cores, correct? Otherwise, I
>wouldn't have any parallelism gains, which I can get, because of the
>GIL release. And let's say that my processing code can take a while
>sometimes.

Yep.

>
>Which leads me to the other question - what should I do in the case
>where I need to occasionally run big chunks of code. No blocking
>calls, just crunch something down. Is deferToThread the only solution
>for that? Is the idea to compose that as big deffered chains so other
>processing might run normally, instead of wait for the big function to
>exit? Because deferToThread will only get me anything, if there's a
>blocking call inside, or if I mange to get parallelism out of it, if
>it's something handled by GIL-released code.

deferToThread is one solution (you can use processes instead of threads,
but that's roughly the same idea).  Deferred aren't sensible for CPU-bound
tasks.  They just make the implementation slower and more complex, and they
probably _don't_ allow other tasks to run, since a Deferred is just a way
to track results, it doesn't imply any special scheduling.  This means the
different chunks of your computation will still run all at once and block
other tasks from running unless you explicitly insert scheduling logic.  If
that is interesting, then twisted.internet.task.coiterate may be interesting.
However, having a thread-safe CPU-bound task (preferably one which is all
self-contained and doesn't need to talk to other APIs, certainly not Twisted
APIs) and running it in a thread with deferToThread is sensible.

>
>Sorry if I sound too confusing, I'm trying to wrap it all in my head
>before I dive in handling the service.
>

Not very confusing at all. :)

Jean-Paul