[Twisted-Python] epoll and other questions

Tommi Virtanen tv at twistedmatrix.com
Wed Oct 6 16:02:27 EDT 2004

Andrea Arcangeli wrote:
> Is there any plan to use epoll instead of poll to make twisted
> scalabile with hundred thousand simultanous sockets connected?

There has been some work (I personally wrote a partial epoll python
library at the time epoll was very new). I think the progress stopped
then because of epoll API instability; now that epoll is no longer a
moving target, someone should get back on the case.

> twisted use epoll. I assume my application would require no change,
> so I can start developing with current twisted, I can test it with
> poll, and then later fix the internals when the slowdown becomes
> noticeable.  Right?

Yes. All the different reactors implement the same interface.

Also, notice that the default reactor most likely uses select, not poll:

$ python -c 'from twisted.internet import reactor; print reactor'
<twisted.internet.default.SelectReactor instance at 0x401fb16c>
$ python -c 'from twisted.internet import pollreactor; \
  pollreactor.install(); from twisted.internet import reactor; \
  print reactor'
<twisted.internet.pollreactor.PollReactor instance at 0x401f33ec>

> I understand there's no limitation on the number of sockets 
> simultanously open, I just need to use ulimit to boost the limit of
> fds.

My gut feeling is you'll either hit an OS limit or sys.maxint,
and the latter is pretty huge. Haven't looked at the details.

> A slightly separated issue: I assume it's best for me not to do any 
> blocking I/O in the main network server handling the 100k connections
> and to create a secondary internal server communicating again through
> tcp/ip (loopback device) with the primary server to do the real
> blocking I/O. Is this correct? Best would be to use asynchronous I/O
> for the IO, but I think using a second process will be a lot simpler
> in practice since I don't need bulk I/O performance (I only need to
> avoid blocking). I only want to keep the network pipeline full even
> when some disk-read is happening. Best would be to use threading (or
> shared memory with MAP_SHARED in tmpfs), but it seems twisted is not
> mature enough for threading and shared memory communication using
> futex, right?  If I would write it in C I could probably get various
> performance bits faster but I doubt the time spent on those bits
> would payoff significantly, opinions?

Well, there's nothing Twisted- or even Python-specific in that.
The solution probably depends heavily on your dataset size, access
patterns, and available RAM. Some people advocate heavy RAM caching.
Sendfile might be the solution, but I don't think there's any
integration of sendfile with python, far less with twisted.

Your plan on isolating disk IO to separate process(es) sounds quite
sane. Your master process could receive the file data from the IO
workers in blocks via a shared mmap, to avoid passing it through a
socket (even if the socket was a local TCP connection or UNIX domain).
Don't know if that optimization is worth it; I would delay writing
any extra code until the problem actually shows up.

Note that python threading is very likely _not_ what you want;
the threads synchronize in the interpreter level quite a lot.

Sadly, not even http://www.kegel.com/c10k.html (which is normally
_the_ resource for things like this) talks that much about disk IO.

> Another thing I plan doing is to ship the public key (matching the 
> private key stored only on the server) on the client source tarball, 
> this way as far as people downloaded the right tarball, they will be 
> able to securely connect to the server since they will be able to
> check the signature. Is there any example of this idea (public key
> stored in a file in the client package) available somewhere?

SFS (secure file system) does something like that. The info page
included has this:

	SFS clients require no configuration.  Simply run the program
	`sfscd', and a directory `/sfs' should appear on your system.
	To test your client, access our SFS test server.  Type the
	following commands:

	  % cd /sfs/@sfs.fs.net,uzwadtctbjb3dg596waiyru8cx5kb4an
	  You have set up a working SFS client.

	Note that the `/sfs/@sfs.fs.net,...' directory does not need to
	exist before you run the `cd' command.  SFS transparently mounts
	new servers as you access them.

The part after the comma is a hash of the public key the server at
sfs.fs.net must present, in order to be accepted.

More information about the Twisted-Python mailing list