[Twisted-Python] epoll and other questions

Andrea Arcangeli andrea at cpushare.com
Thu Oct 7 17:53:34 EDT 2004


On Wed, Oct 06, 2004 at 11:02:27PM +0300, Tommi Virtanen wrote:
> then because of epoll API instability; now that epoll is no longer a
> moving target, someone should get back on the case.

agreed ;).

Ideally with truly huge number of sockets open, the time wasted in poll
at some point would be more than the time wasted in the python
interpreter (if compared to a C source). Would be interesting to measure
the breakpoint, so when the poll cost becomes higher than the
interpreter.

> >twisted use epoll. I assume my application would require no change,
> >so I can start developing with current twisted, I can test it with
> >poll, and then later fix the internals when the slowdown becomes
> >noticeable.  Right?
> 
> Yes. All the different reactors implement the same interface.
> 
> Also, notice that the default reactor most likely uses select, not poll:
> 
> $ python -c 'from twisted.internet import reactor; print reactor'
> <twisted.internet.default.SelectReactor instance at 0x401fb16c>
> $ python -c 'from twisted.internet import pollreactor; \
>  pollreactor.install(); from twisted.internet import reactor; \
>  print reactor'
> <twisted.internet.pollreactor.PollReactor instance at 0x401f33ec>

good point. I'll use pollreactor for now. Apparently, I still have to
use the normal "select" reactor for interfacing with pyqt, but that's ok
since I don't (yet) need scalability on the client side...

> My gut feeling is you'll either hit an OS limit or sys.maxint,
> and the latter is pretty huge. Haven't looked at the details.

ok fine ;).

> Well, there's nothing Twisted- or even Python-specific in that.
> The solution probably depends heavily on your dataset size, access
> patterns, and available RAM. Some people advocate heavy RAM caching.

yes, heavy ram caching is fine for reads, but writes may still require
O_SYNC.

> Sendfile might be the solution, but I don't think there's any
> integration of sendfile with python, far less with twisted.

sendfile is synchronous too, so I don't think it'd solve the problem.
Plus sendfile only works from the filesystem to the network, while for
me it's almost the other way around and I've to parse the data anyways
(I'm even thinking to use pickle objects as storage for each user, but
I'm a bit afraid about the versioning and the unpickle/pickle
performance, so if I upgrade the user class and then all unpickle breaks
because I lack a on-disk format different from the in-memory format).

> Your plan on isolating disk IO to separate process(es) sounds quite
> sane. Your master process could receive the file data from the IO
> workers in blocks via a shared mmap, to avoid passing it through a

so you're saying I could already used shared mmap. but how to serialize
then? I'd need pthread_mutex for that. Otherwise if I have to serialize
through a pipe I can as well send the data through the pipe as well
(it's not going to be high bandwidth communication where an additional
memcpy matters, it'd prefer shared mem only for lowlatency and
full-userspace locking for the data producer)
 
> socket (even if the socket was a local TCP connection or UNIX domain).
> Don't know if that optimization is worth it; I would delay writing
> any extra code until the problem actually shows up.

Agreed ;)

> Note that python threading is very likely _not_ what you want;
> the threads synchronize in the interpreter level quite a lot.

agreed, it's not really scaling. This is also why I doubt the
serialization through shmem would work well, unless I write a module
from scratch for the pthread_mutex futex driven locking.

> >Another thing I plan doing is to ship the public key (matching the 
> >private key stored only on the server) on the client source tarball, 
> >this way as far as people downloaded the right tarball, they will be 
> >able to securely connect to the server since they will be able to
> >check the signature. Is there any example of this idea (public key
> >stored in a file in the client package) available somewhere?
> 
> SFS (secure file system) does something like that. The info page
> included has this:
> 
> 	SFS clients require no configuration.  Simply run the program
> 	`sfscd', and a directory `/sfs' should appear on your system.
> 	To test your client, access our SFS test server.  Type the
> 	following commands:
> 
> 	  % cd /sfs/@sfs.fs.net,uzwadtctbjb3dg596waiyru8cx5kb4an
> 	  % cat CONGRATULATIONS
> 	  You have set up a working SFS client.
> 	  %
> 
> 	Note that the `/sfs/@sfs.fs.net,...' directory does not need to
> 	exist before you run the `cd' command.  SFS transparently mounts
> 	new servers as you access them.
> 
> The part after the comma is a hash of the public key the server at
> sfs.fs.net must present, in order to be accepted.

I found sfscd program, but it's not a python program and it seems a bit
different from what I wanted to do. My object was to create a
private/public key pair, and to use an SSL library to load that file
automatically and use it as the public/private key. My point is that if
twisted supports the native ssh protocol from id_rsa* than it'll be a
joke to implement my public/private key in a file too.  I was just
trying to reuse whatever is available right now, be it
SSH/SSL/sshtunnel/whatever as transport for the encryption. So if you've
a suggestion of what encrypted transport to use that's welcome.

Thank you very much for the help!




More information about the Twisted-Python mailing list