[Twisted-Python] Scaling problem
Uwe C. Schroeder
uwe at oss4u.com
Wed Nov 5 12:32:23 EST 2003
-----BEGIN PGP SIGNED MESSAGE-----
On Wednesday 05 November 2003 07:31 am, Itamar Shtull-Trauring wrote:
> On Wed, 5 Nov 2003 02:48:07 -0800
> "Uwe C. Schroeder" <uwe at oss4u.com> wrote:
> > I think I suffer from database latency in general and with the
> > expensive function in particular. The expensive function could be
> > moved to a thread and maybe I'll give it a try. However if I do this
> > for every method that writes more than 10 or 15 records to the
> > database I'll most likely end up with thousands of threads.
> No, Twisted has a thread pool, so if you are using Twisted APIs you
> won't get more than the max number of threads the pool is configured to
I know and I increased the threadpool limit a bit. Not much though because in
the end it all runs on one cpu. I asume that if I start say 50 threads via
deferToThread and the pool limit is 25 the requested threads are started when
one becomes available ?
> > The most expensive function I have (besides the printing mentioned
> > above) executes in about 3 seconds. This is not "blocking" in my
> > understanding, since the function never waits for something to happen,
> > it just calculates and stores stuff. Twisted is unresponsive for those
> > 3 seconds.
> If Twisted is unresponsive, that means it blocks. Any function that
> takes more than 10ms should probably count as blocking, really. Try
> running it in a thread. Because of the global interpreter lock, this
> won't reduce runtime, but it'll enhance responsiveness to other clients.
Responsiveness is what I need. Users are happy to wait a couple seconds if one
can notify him and his screen still shows some "progress".
> In general, it sounds like you are doing a lot of database queries. Some
> ways to optimize this:
> 1. Do less queries - cache, merge identical requests, etc..
Yep - I already started moving some of the things into the database as stored
procedures. This reduces the execution time tremendously since it get's rid
of the DB API overhead. I know I still have some optimization potential by
handing data blocks between methods so a particular method doesn't have to
requery the database for something an other method already did.
> 2. Do a number of queries in a single transaction in a single thread,
> rather than deferToThread, get result, run another query with
> deferToThread again, etc..
I already do this, at least in some parts. You're right, the thread handling
overhead is way too high to have single simple queries running in a thread
each. It doesn't make sense to use a thread which costs you 1ms to set up for
something that only runs 1ms.
> (3. Don't use RDBMS. Use an in-process database like ZODB or Quotient's
> atop when it's done.)
:-) I'd probably do that with a web application, but even there I'll allways
prefer a "real" database system. In process databases may be faster, therefor
they are less relieable and correcting errors is a virtual impossibility.
Also the sher size of the data I handle would make something like ZODB blow up
to a couple gigabytes in one month alone.
Since my customers generally can afford it I rather use big iron for a couple
thousand $$ to make the database faster. Specially in the finance and
insurance industry it also makes the management happier if they see some big
machines that look impressive in the rack. That's what they're used to and it
gives them confidence in their investment (although one probably only uses
10% of the machine :-) )
Open Source Solutions 4U, LLC 2570 Fleetwood Drive
Phone: +1 650 872 2425 San Bruno, CA 94066
Cell: +1 650 302 2405 United States
Fax: +1 650 872 2417
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.1 (GNU/Linux)
-----END PGP SIGNATURE-----
More information about the Twisted-Python