[Twisted-Python] Why deferToThread is so slow?

Fri Jun 3 05:27:21 MDT 2016

On 06/03/16 10:24, Glyph wrote:
>
>> On Jun 3, 2016, at 01:06, Nagy, Attila <bra at fsn.hu 
>> <mailto:bra at fsn.hu>> wrote:
>>
>> Hi,
>>
>> I have a thread safe synchronous library, which I would like to use 
>> in a threadpool using deferToThread.
>>
>> Without using (deferTo)threads I get consistent 1-3 ms response 
>> times, with deferring to threadpool, I get 30-300, varying wildly.
>
> Why do you think this is bad performance?
>
> With a direct call, you are doing almost nothing.  Just pushing a 
> stack frame.
>
> With a deferToThread call, you are:
[...]

Sure, this is not the perfect example, I just wanted to measure the 
plain latency which this solution gives.
The whole picture is this:
I have an application which runs in uwsgi in multithreaded mode. It uses 
(the blocking)elasticsearch client.
That app can serve queries with some tens of concurrent requests in 
around 3 ms.

For some reasons I would like to rewrite this app in Twisted. If I use 
the txes2 lib (which is nonblocking), I can achieve around the same 
performance (although it varies a lot more). This is async, no threads 
are involved.

My problem is that this library lacks several features, so I would like 
to use the blocking one, which needs to run in threads.
When I do the requests in threads (with deferToThread, or just 
callInThread the whole handler) the response time is around 10-20 times 
more than uwsgi's threaded and blocking and Twisted's async and becomes 
highly unpredictable.

I haven't looked into the details of Twisted's threadpools, but what I 
would expect here is the same as using a simple python threadpool (like 
something uwsgi does, or just in the standard libraries), which 
according to the results work much faster and predictable than Twisted's.

BTW, I use queues in non-twisted programs and they are nowhere to cause 
several milliseconds(!) of latency.

OK, here's a more realistic example:
https://gist.github.com/bra-fsn/08734197601e5a63d6a2b56d7b048119

This does what is described above: calls an ES query in a Twisted 
threadpool and calls it directly in the thread the whole loop runs.

With one thread the overhead is somewhat acceptable:
deferToThread: avg 2051.00 us, sync: avg 1554.70 us, 1.32x increase
The direct call responds in 1.5 ms, while the deferToThread returns in 2ms.

Things get worse with the concurrency.
With 16 threads the response time is 18 times of the direct call (51 ms 
vs 2.8 ms!):
deferToThread: avg 51515.36 us, sync: avg 2798.19 us, 18.41x increase

With 32 threads:
deferToThread: avg 108222.73 us, sync: avg 2922.28 us, 37.03x increase

I use normal (stdlib) threadpools and I haven't seen this kind of 
performance degradation.

100 ms is a lot of time...
-------------- next part --------------
An HTML attachment was scrubbed...
URL: </pipermail/twisted-python/attachments/20160603/18ea4fd2/attachment-0002.html>