[Twisted-web] Parallelism is twisted

Andreas Kostyrka andreas at kostyrka.org
Wed Jan 6 22:16:53 EST 2010


Am Mittwoch, den 06.01.2010, 20:04 -0600 schrieb arun chhetri:
> First of all I would like to thank to developer of twisted for making
> such a great platform for network applications,, I just love it...
> But for theory point of view I want to clear some of my doubts.
> 
> 1) What is the underlying technology?
> As, I believe that twisted is a TCP server which which uses select()
> based call to handle multiple requests. Is it true or there is
> something else which I am not aware of.

Basically yes. It uses select or more or less equivalent techniques to
do an event handling loop. It's called a reactor in Twisted ;)

> 
> 2) So, if twisted is single process based network application
> framework (without threads and forks),, I believe it cannot take
> advantage of multiple processor residing on one machine. For e.g. if
> OS schedule twisted process on one processor,, that's it,, it can only
> run on same processor not taking advantage of multiple processor. I am
> kind of confused in this question. Can anyone shed some light on this?

Correctly. If you want to use multiple processors, you should consider
running multiple processes, e.g. using load balancing.

> 
> 3) So, suppose I have one twisted reactor based process running,, I
> can use defferToThread as one of the way to kind of using multiple
> processing?

Yes and no. Yes it runs in a thread. Yes, in some cases you can use many
cores this way. But no, Python with it's GIL is not the perfect language
to write threaded programs that want to use threads for performance.

> 
> 4) So, one of the technique to achieve parallelism is by running
> multiple twisted reactor on different processor and using some
> scheduler which takes the web request and forwards that request to
> different twisted reactor instance on different processor. 
> This is close to Load balancing as we can also use the upper
> architecture and configure the scheduler to schedule requests to
> different computers.

Exactly. Or you can split the work asymetrically, I once had a server
that did the work (data lookups), but used multiple backends for
different languages (1 for French, a couple of English lookup
processes).

One thing to consider is that Twisted, when done correctly is rather
efficient. 

2 experiences from my work come to mind:

-) once I've done a web crawler in Twisted without any consideration how
many connections I create. The website which was not a first tier
operation (like Google, Yahoo, MSN, ...), but still some commerical
offering was DoSed in seconds. Twisted has no problem with opening 20000
http connections (if you give it the file descriptors to manage that),
but typical sites, don't take that very well. Lesson to consider: Always
consider your server and client capabilities, and rate limit stuff as
needed.

-) I've also done some years ago a delivery network, that patched files
on the fly (by substituting a couple of bytes, nothing eleborate, but it
did complicate the reading and sending of the data). Even early versions
of that http server (it was twisted.web2 back then) where capable to
fill a Gigabit pipe without issues.

So the "single-threadness" of Twisted (and in some ways of Python) is
not that big an issue, other moving parts will probably limit you first,
e.g. SQL databases are notorious for being a choking point for scaling.

Andreas
> 
> Many of the questions might sound weird,, and please feel free to
> write any comment. 
> Thanks
> Arun
> 
> _______________________________________________
> Twisted-web mailing list
> Twisted-web at twistedmatrix.com
> http://twistedmatrix.com/cgi-bin/mailman/listinfo/twisted-web




More information about the Twisted-web mailing list