[Twisted-Python] Scaling problem

Uwe C. Schroeder uwe at oss4u.com
Wed Nov 5 05:48:07 EST 2003

Hash: SHA1

On Tuesday 04 November 2003 12:20 pm, Glyph Lefkowitz wrote:
> Uwe C. Schroeder wrote:
> > Is there a way to have twisted do something like apache or postgres and
> > split the incoming connection to several processes ? It's not a web
> > application, I'm solely using pb.
> This is possible, although not as easy as I'd like it to be.  In
> principle, on most UNIX-based OSes, it's possible to listen() and
> accept() in one process, pass the returned file descriptor to a
> subprocess and do communication there.  I'd like to make this automatic
> at some point, but won't have the time to do it for at least another few
> months.
> The fact that it's not as graceful or automatic as I like doesn't mean
> it's impossible though.  If you want to just have one "control" process
> that shuttles around I/O and N worker processes that do the actual work,
> you can still use spawnProcess and communicate over stdin/stdout to the
> subprocesses about new connections using a simple protocol which
> multiplexes multiple socket connections down to the one pipe connection
> and back.
> Also, if database latency is your main problem, or you can move a little
> bit of performance-sensitive code into C and relinquish the GIL while
> they're happening, you can just use callInThread to run your
> transactions in threads instead of having the reactor waiting on them.

I figured that the impact of callInThread is much higher to the rest of the 
system than I like. I'm already using deferToThread a lot, however there are 
limits to the usefulness of threads.  For example I use it to do page 
formatting (printjobs). Each of the printjobs is about 120 pages to format. 
If I start 20 to 30 such formatting jobs with deferToThread twisted virtually 
becomes unresponsive. If I limit to a max of 3 or so it works. I admit I 
don't understand this, since IMHO defering to a thread should make it 
independant from the parent. I didn't dig into this problem since It's not a 
big deal to limit to a few such jobs at any given time and use a spooler type 
interface to suspend requests for later.

I think I suffer from database latency in general and with the expensive 
function in particular. The expensive function could be moved to a thread and 
maybe I'll give it a try. However if I do this for every method that writes 
more than 10 or 15 records to the database I'll most likely end up with 
thousands of threads. Currently the system is used by up to 15 people, but 
projection goes to 400+ people at any given time. I don't have a problem with 
having people wait for a result, I have a problem with the other people that 
don't get a result because the framework stops responding for a couple of 
functions that just take more than a millisecond to execute. 
I'd rather have 500 processes like with apache and have the OS do the 
scheduling - which is way more efficient than having the framework schedule.
The most expensive function I have (besides the printing mentioned above) 
executes in about 3 seconds. This is not "blocking" in my understanding, 
since the function never waits for something to happen, it just calculates 
and stores stuff. Twisted is unresponsive for those 3 seconds. Multiply that 
by 10 users and one user might end up with a delay of more than 30 seconds 
before his "3 seconds" are evaluated. If I had a lot of processes the OS 
would schedule that more efficiently since then the function might take 5 
seconds to complete, but it would be interrupted by the OS scheduler to hand 
processing time to one of the other processes. 

Maybe I'll look into splitting into n worker processes. Since this would 
involve going into twisted code, where would I start looking ?


- --
Open Source Solutions 4U, LLC	2570 Fleetwood Drive
Phone:  +1 650 872 2425		San Bruno, CA 94066
Cell:   +1 650 302 2405		United States
Fax:    +1 650 872 2417
Version: GnuPG v1.2.1 (GNU/Linux)


More information about the Twisted-Python mailing list