[Twisted-Python] profiling twisted

Markus Schiltknecht markus at bluegap.ch
Mon Jul 2 05:04:56 EDT 2007


Hi,

glyph at divmod.com wrote:
> What is 'real load'?  Are you talking about things in process with, but 
> not related to, the web server?

I should have been more specific, but OTOH, it really didn't matter. 
Anyway: what I meant was: 1 concurrent request => very small load, 8 
concurrent requests => real load.

> Maybe your server is slow? :)

;-)  Probably not that slow...

>> Can I somehow get the reactor's state, i.e. how many deferreds are 
>> waiting in the queue, how many threads are running concurrently, etc?
> 
> There is no queue of Deferreds.  They don't actually have anything to do 
> with the reactor.

Hm.. good to know.

>> How good is the idea of deferring File I/O to threads, i.e. 
>> threads.deferToThread(self.filehandle.write, data)?
> 
> If you do indeed discover that you are waiting a long time to write your 
> particular stuff to files, then that might help.  It might also randomly 
> interleave the data and corrupt your files, if 'data' is big enough.

Uh.. I'm taking care, that only one thread writes to a file at any time.

>> Another possibly blocking module might be the database api, but I'm 
>> using twisted's enterprise adbapi, which should be async, AFAICT.
> 
> It does the operations in threads, yes.  However, the threadpool will 
> eventually fill up; the concurrency is fairly limited.  (The default is 
> 10 or so workers, I think).

Yeah, I've enlarged that to 50 threads, which should be enough.

>> Maybe copying data around takes time. I'm sending around chunks of 64k 
>> size (streaming from the database to an external programm). Reducing 
>> chunk size to 1k helps somewhat (i.e. response time is seldom over 
>> 150ms, but can still climb up to > 0.5 seconds).
> 
> That's a possibility that the "--profile" option to twistd which JP 
> suggested might help you with.  You'll see the functions copying data 
> taking a lot of CPU time in that case.

Didn't help me much, this time...

>> Hum... external program.... maybe it's the self.transport.write() call 
>> which blocks several 100ms? Is it safe to write:
>>
>>   d = threads.deferToThread(self.transport.write, dataChunk)
>>
>> (i.e. call transport.write from a thread?)
> 
> No.  _All_ Twisted APIs are not thread safe.  This call does not block 
> though, and it is extremely, vanishingly unlikely that it is causing 
> your problems.  It just sticks some data into the outgoing queue and 
> returns immediately.

Okay, I've removed that deferToThread(). Didn't solve my problem anyway.

> One quick measurement you can do to determine what might be causing this 
> performance loss is to examine the server in 'top' while it is allegedly 
> under load.  Is it taking up 100% CPU?  If not, then it's probably 
> blocked on some kind of I/O in your application, or perhaps writing the 
> log.  If so, then there's some inefficient application code (or Twisted 
> code) that you need to profile and optimize.  The output of "strace -T" 
> on your Twisted server *might* be useful if you discover that you're 
> blocking on some kind of I/O.

Yup, I did that, but it's hard to tell what's wrong from these things.


Anyway, with very simple timing measures within the twisted server 
itself, I've figured out what was causing the delays: 
reactor.spawnProcess() takes more than a second.  I knew that fork() was 
expensive, but that expensive?


What I'm doing now feels very dirty: I'm calling reactor.spawnProcess() 
from a thread. (Yes, I'm taking care that only one thread can spawn a 
process at any time.) At least on my Linux Dev-Box, that seems to work - 
and resolves my issue. But... calling fork() from a thread???


Are there other ways to start and control external processes? Preferably 
even compatible to Windoze?


Thanks to everybody for your help...


Regards

Markus





More information about the Twisted-Python mailing list