[Twisted-Python] Twisted's ability to handle many open connections.

Matt Goodall matt at pollenation.net
Wed Apr 26 19:18:44 EDT 2006


Terry Jones wrote:
> I'm completely new to Twisted. As I understand it, a Twisted server runs as
> a single process - no forking, no threads.

Correct. Twisted is event driven. It doesn't need to fork and spawn 
threads to handle a lot of network connections at once.

However, Twisted is perfectly capable of managing any processes and 
threads an application *needs* to create. In fact, Twisted's core 
includes APIs specifically created to handle background processes and 
threads.


> To me that implies that select
> (or poll or pselect, etc) is being heavily used to feed data to Deferreds.

select/poll/whatever is at the core of Twisted's event loop, known as 
the 'reactor' but Deferreds are completely unrelated to data delivery.

A Deferred is a callback mechanism, nothing more. A function typically 
returns a Deferred to say, "hey, you asked me to do something and I will 
give my response ... just not right now".


> 
> If that's the case, how does Twisted (portably) allow for a large number of
> file descriptors?

As you recognised, Twisted uses a select loop so I'm not quite sure what 
your question means. However ...

Twisted chooses the best available reactor implementation for the 
platform. "Best", here, probably meaning most available (default?) 
rather than fastest.

I'm no expert here but on Linux the reactor really does use a select() 
loop, on win32 it uses the win32 equivalent of a select loop 
(WSAEventSelect and MsgWaitForMultipleObjects, or something), and I 
believe it uses poll on the BSDs.

If the "best" reactor is not actually the best (if you see what I mean!) 
then you can manually install a non-default reactor. There are plenty to 
choose from.

For instance, on Linux if you want to handle a huge number of 
simultaneous connections then installing a poll-based reactor is 
probably a good idea. If your application uses GTK+ then you would 
install a GTK+ reactor.


> I'm considering using Twisted for a project. Twisted would front requests
> from the web, but also from (e.g.) remote command line applications via an
> API.

This is what Twisted excels at - multi-protocol network applications. 
I've done it many times now and it just works.

Twisted already includes support (of varying quality) for most common 
protocols but it's very easy to add new protocols if necessary.


> These latter connections could potentially be numerous, and long
> lived.  Should I be thinking about how Twisted deals with this, or does it
> already scale to allow more simultaneous connections / file descriptors
> than a naive server program would be able to maintain?

Twisted is event driven and most people believe that event driven scales 
better, especially for network-related work. I therefore believe Twisted 
will *help* you write code that will scale to a high number of 
simultaneous connections. Twisted does perform any magic though.

You will need to understand event driven programming to scale up, 
otherwise you're likely to block the application from doing more than 
one thing at a time (in the cooperatively multitasking sense) making 
your application utterly unscalable.

Also, note that Twisted does not magic away other common problems 
associated with scalability. The obvious one that springs to mind 
(because it bit me recently :-/) is the maximum number of file 
descriptors a non-root Linux process can open per process.

> 
> Given what looks like widespread use of Deferreds in Twisted, I suppose
> that there is not a 1-1 relationship between connections and file
> descriptors? If so this makes the issue more pressing.

There is absolutely a 1-1 relationship between connections and file 
descriptors but Deferred have nothing to do with it.

It's quite possible to write a Twisted application without ever using a 
Deferred. If your application can respond immediately (well, very 
quickly) to requests that arrive from the network then you don't need 
Deferreds.

You need a Deferred when you cannot respond immediately. In that case 
you would create a Deferred and store it somewhere, spawn something 
*non-blocking* to handle the request and return the Deferred to the 
caller. Then, when the non-blocking thing completes, you callback via 
the Deferred you stored to let the caller know the result.

> 
> Thanks for any help,
> Terry

Hope it did help!

Cheers, Matt

-- 
      __
     /  \__     Matt Goodall, Pollenation Internet Ltd
     \__/  \    w: http://www.pollenation.net
   __/  \__/    e: matt at pollenation.net
  /  \__/  \    t: +44 (0)113 2252500
  \__/  \__/
  /  \	       Any views expressed are my own and do not necessarily
  \__/          reflect the views of my employer.




More information about the Twisted-Python mailing list