[Twisted-Python] Scaling problem
Uwe C. Schroeder
uwe at oss4u.com
Wed Nov 5 05:48:07 EST 2003
-----BEGIN PGP SIGNED MESSAGE-----
On Tuesday 04 November 2003 12:20 pm, Glyph Lefkowitz wrote:
> Uwe C. Schroeder wrote:
> > Is there a way to have twisted do something like apache or postgres and
> > split the incoming connection to several processes ? It's not a web
> > application, I'm solely using pb.
> This is possible, although not as easy as I'd like it to be. In
> principle, on most UNIX-based OSes, it's possible to listen() and
> accept() in one process, pass the returned file descriptor to a
> subprocess and do communication there. I'd like to make this automatic
> at some point, but won't have the time to do it for at least another few
> The fact that it's not as graceful or automatic as I like doesn't mean
> it's impossible though. If you want to just have one "control" process
> that shuttles around I/O and N worker processes that do the actual work,
> you can still use spawnProcess and communicate over stdin/stdout to the
> subprocesses about new connections using a simple protocol which
> multiplexes multiple socket connections down to the one pipe connection
> and back.
> Also, if database latency is your main problem, or you can move a little
> bit of performance-sensitive code into C and relinquish the GIL while
> they're happening, you can just use callInThread to run your
> transactions in threads instead of having the reactor waiting on them.
I figured that the impact of callInThread is much higher to the rest of the
system than I like. I'm already using deferToThread a lot, however there are
limits to the usefulness of threads. For example I use it to do page
formatting (printjobs). Each of the printjobs is about 120 pages to format.
If I start 20 to 30 such formatting jobs with deferToThread twisted virtually
becomes unresponsive. If I limit to a max of 3 or so it works. I admit I
don't understand this, since IMHO defering to a thread should make it
independant from the parent. I didn't dig into this problem since It's not a
big deal to limit to a few such jobs at any given time and use a spooler type
interface to suspend requests for later.
I think I suffer from database latency in general and with the expensive
function in particular. The expensive function could be moved to a thread and
maybe I'll give it a try. However if I do this for every method that writes
more than 10 or 15 records to the database I'll most likely end up with
thousands of threads. Currently the system is used by up to 15 people, but
projection goes to 400+ people at any given time. I don't have a problem with
having people wait for a result, I have a problem with the other people that
don't get a result because the framework stops responding for a couple of
functions that just take more than a millisecond to execute.
I'd rather have 500 processes like with apache and have the OS do the
scheduling - which is way more efficient than having the framework schedule.
The most expensive function I have (besides the printing mentioned above)
executes in about 3 seconds. This is not "blocking" in my understanding,
since the function never waits for something to happen, it just calculates
and stores stuff. Twisted is unresponsive for those 3 seconds. Multiply that
by 10 users and one user might end up with a delay of more than 30 seconds
before his "3 seconds" are evaluated. If I had a lot of processes the OS
would schedule that more efficiently since then the function might take 5
seconds to complete, but it would be interrupted by the OS scheduler to hand
processing time to one of the other processes.
Maybe I'll look into splitting into n worker processes. Since this would
involve going into twisted code, where would I start looking ?
Open Source Solutions 4U, LLC 2570 Fleetwood Drive
Phone: +1 650 872 2425 San Bruno, CA 94066
Cell: +1 650 302 2405 United States
Fax: +1 650 872 2417
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.1 (GNU/Linux)
-----END PGP SIGNATURE-----
More information about the Twisted-Python