[Twisted-Python] Scaling problem

Uwe C. Schroeder uwe at oss4u.com
Wed Nov 5 12:32:23 EST 2003

Hash: SHA1

On Wednesday 05 November 2003 07:31 am, Itamar Shtull-Trauring wrote:
> On Wed, 5 Nov 2003 02:48:07 -0800
> "Uwe C. Schroeder" <uwe at oss4u.com> wrote:
> > I think I suffer from database latency in general and with the
> > expensive function in particular. The expensive function could be
> > moved to a thread and maybe I'll give it a try. However if I do this
> > for every method that writes more than 10 or 15 records to the
> > database I'll most likely end up with thousands of threads.
> No, Twisted has a thread pool, so if you are using Twisted APIs you
> won't get more than the max number of threads the pool is configured to
> use.

I know and I increased the threadpool limit a bit. Not much though because in 
the end it all runs on one cpu. I asume that if I start say 50 threads via 
deferToThread and the pool limit is 25 the requested threads are started when 
one becomes available ?

> > The most expensive function I have (besides the printing mentioned
> > above) executes in about 3 seconds. This is not "blocking" in my
> > understanding, since the function never waits for something to happen,
> > it just calculates and stores stuff. Twisted is unresponsive for those
> > 3 seconds.
> If Twisted is unresponsive, that means it blocks. Any function that
> takes more than 10ms should probably count as blocking, really. Try
> running it in a thread. Because of the global interpreter lock, this
> won't reduce runtime, but it'll enhance responsiveness to other clients.
Responsiveness is what I need. Users are happy to wait a couple seconds if one 
can notify him and his screen still shows some "progress".

> In general, it sounds like you are doing a lot of database queries. Some
> ways to optimize this:
> 1. Do less queries - cache, merge identical requests, etc..

Yep - I already started moving some of the things into the database as stored 
procedures. This reduces the execution time tremendously since it get's rid 
of the DB API overhead. I know I still have some optimization potential by 
handing data blocks between methods so a particular method doesn't have to 
requery the database for something an other method already did.

> 2. Do a number of queries in a single transaction in a single thread,
> rather than deferToThread, get result, run another query with
> deferToThread again, etc..

I already do this, at least in some parts. You're right, the thread handling 
overhead is way too high to have single simple queries running in a thread 
each. It doesn't make sense to use a thread which costs you 1ms to set up for 
something that only runs 1ms.

> (3. Don't use RDBMS. Use an in-process database like ZODB or Quotient's
> atop when it's done.)

:-) I'd probably do that with a web application, but even there I'll allways 
prefer a "real" database system. In process databases may be faster, therefor 
they are less relieable and correcting errors is a virtual impossibility.
Also the sher size of the data I handle would make something like ZODB blow up 
to a couple gigabytes in one month alone. 
Since my customers generally can afford it I rather use big iron for a couple 
thousand $$ to make the database faster. Specially in the finance and 
insurance industry it also makes the management happier if they see some big 
machines that look impressive in the rack. That's what they're used to and it 
gives them confidence in their investment (although one probably only uses 
10% of the machine :-) )

- --
Open Source Solutions 4U, LLC	2570 Fleetwood Drive
Phone:  +1 650 872 2425		San Bruno, CA 94066
Cell:   +1 650 302 2405		United States
Fax:    +1 650 872 2417
Version: GnuPG v1.2.1 (GNU/Linux)


More information about the Twisted-Python mailing list