[Twisted-Python] Scaling problem

Wed Nov 5 13:00:23 EST 2003

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Wednesday 05 November 2003 07:14 am, Jp Calderone wrote:
> On Wed, Nov 05, 2003 at 02:48:07AM -0800, Uwe C. Schroeder wrote:
> > -----BEGIN PGP SIGNED MESSAGE-----
> > Hash: SHA1
> >
> > On Tuesday 04 November 2003 12:20 pm, Glyph Lefkowitz wrote:
> > > Uwe C. Schroeder wrote:
> > > > Is there a way to have twisted do something like apache or postgres
> > > > and split the incoming connection to several processes ? It's not a
> > > > web application, I'm solely using pb.
> > >
> > > This is possible, although not as easy as I'd like it to be.  In
> > > principle, on most UNIX-based OSes, it's possible to listen() and
> > > accept() in one process, pass the returned file descriptor to a
> > > subprocess and do communication there.  I'd like to make this automatic
> > > at some point, but won't have the time to do it for at least another
> > > few months.
> > >
> > > The fact that it's not as graceful or automatic as I like doesn't mean
> > > it's impossible though.  If you want to just have one "control" process
> > > that shuttles around I/O and N worker processes that do the actual
> > > work, you can still use spawnProcess and communicate over stdin/stdout
> > > to the subprocesses about new connections using a simple protocol which
> > > multiplexes multiple socket connections down to the one pipe connection
> > > and back.
> > >
> > > Also, if database latency is your main problem, or you can move a
> > > little bit of performance-sensitive code into C and relinquish the GIL
> > > while they're happening, you can just use callInThread to run your
> > > transactions in threads instead of having the reactor waiting on them.
> >
> > I figured that the impact of callInThread is much higher to the rest of
> > the system than I like. I'm already using deferToThread a lot, however
> > there are limits to the usefulness of threads.  For example I use it to
> > do page formatting (printjobs). Each of the printjobs is about 120 pages
> > to format. If I start 20 to 30 such formatting jobs with deferToThread
> > twisted virtually becomes unresponsive. If I limit to a max of 3 or so it
> > works. I admit I don't understand this, since IMHO defering to a thread
> > should make it independant from the parent.
>
>   But it doesn't, not in Python.  Each thread adds new contention for the
> GIL, which makes every thread less responsive.  If the print jobs were
> managed by a C extension which could release the GIL before doing its work,
> then they would be mostly independant.

That explains it. Maybe I'll go ahead and create a separate "print" server 
process that just produces paper. I'm using reportlab and I even had to turn 
off the c extensions used in reportlab, because they are not threadsafe. 
Printing isn't a big deal at this time, simply because I gave them a "print 
later" checkbox which will suspend formatting and printing for the night. At 
night I then just process all the printjobs sequentially.

> > I didn't dig into this problem since It's not a big deal to limit to a
> > few such jobs at any given time and use a spooler type interface to
> > suspend requests for later.
>
>   Yea.  If you only have 4 CPUs, it doesn't make sense to be running 400
> CPU-bound threads :)

Exactly :-)

>
> > I think I suffer from database latency in general and with the expensive
> > function in particular. The expensive function could be moved to a thread
> > and maybe I'll give it a try. However if I do this for every method that
> > writes more than 10 or 15 records to the database I'll most likely end up
> > with thousands of threads. Currently the system is used by up to 15
> > people, but projection goes to 400+ people at any given time. I don't
> > have a problem with having people wait for a result, I have a problem
> > with the other people that don't get a result because the framework stops
> > responding for a couple of functions that just take more than a
> > millisecond to execute.
> > I'd rather have 500 processes like with apache and have the OS do the
> > scheduling - which is way more efficient than having the framework
> > schedule. The most expensive function I have (besides the printing
> > mentioned above) executes in about 3 seconds. This is not "blocking" in
> > my understanding, since the function never waits for something to happen,
> > it just calculates and stores stuff.
>
>   It is blocking.  While the term usually refers to a function that waits
> for some I/O to finish, it applies equally well to any functions that run
> for a non-trivial amount of time and, while doing so, prevent the rest of
> the application from proceeding.
>
> > Twisted is unresponsive for those 3 seconds. Multiply that by 10 users
> > and one user might end up with a delay of more than 30 seconds before his
> > "3 seconds" are evaluated. If I had a lot of processes the OS would
> > schedule that more efficiently since then the function might take 5
> > seconds to complete, but it would be interrupted by the OS scheduler to
> > hand processing time to one of the other processes.
> >
> > Maybe I'll look into splitting into n worker processes. Since this would
> > involve going into twisted code, where would I start looking ?
>
>   Before doing that, you might consider just spawning a handful of worker
> threads and giving them queues.  This keeps the application responsive by
> not forcing an insane number of context switches but also still allowing
> all the expensive functions to run independently of the I/O thread.
>
>   There is even a ThreadPool class in Twisted already (though I have never
> used it myself ;), so giving this approach a try should be pretty quick,
> and if it is still not good enough, you haven't lost much time.
>
>   This is not to say that multiple processes would not also be a good
> solution, just that it might be a harder one to implement.  If you do end
> up looking at multiple processes, reactor.spawnProcess() (as glyph alluded
> to earlier) is one way to go about it.  This is basically the equivalent of
> popen2() (though with a lot more flexibility).

I think a mixture of both might be the best approach. Having several processes 
would be great for MP machines, so one has the ability to scale on a hardware 
basis. Making more intelligent use of threads could increase responsiveness. 
One user is fine wating  for something to happen, but all the others 
shouldn't have to wait for the one user too.

>   On a related, but not necessarily useful note, I've recently begun
> investigating ways of moving the state associated with a user request,
> including the associated socket object, between processes.  My use case is
> not load-balancing, but the code should apply equally well to this as to
> what I intend to use it for.  It requires C extensions (which are currently
> -just- good enough to play around with higher-level stuff in Python with).
> The code is very short, and lives mostly in
> sandbox/exarkun/copyover/fdpass.py in CVS HEAD.

I'll check that out. I wouldn't name it load-balancing. Would be interesting 
though to check if one could have a single "socket server" that distributes 
to several processes on several machines :-) For an application like mine 
load balancing might be much easier, since I could make the clients reconnect 
to a different server/port.

	UC

- --
Open Source Solutions 4U, LLC	2570 Fleetwood Drive
Phone:  +1 650 872 2425		San Bruno, CA 94066
Cell:   +1 650 302 2405		United States
Fax:    +1 650 872 2417
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.1 (GNU/Linux)

iD8DBQE/qTq3jqGXBvRToM4RAsd8AJ9Xme37KfYCxT+ls4SNOAQrzZTi8gCeK42C
Lwf/hmo+TRp/ulHAA49I3Ck=
=W66T
-----END PGP SIGNATURE-----