[Twisted-Python] newbie confusion - puzzling reactor response

Glyph Lefkowitz glyph at twistedmatrix.com
Fri Feb 12 20:58:46 EST 2010


On Feb 12, 2010, at 3:11 PM, exarkun at twistedmatrix.com wrote:

> On 07:35 pm, rich at noir.com wrote:
>> 

>> Er... on second thought... isn't there still a utility in asynchronous
>> file io which yields to the reactor?
>> 
>> It may be always readable/writable, but if I simply read/write, I'll
>> block the process for as long as that takes, block on read, block on
>> write.  Whereas if I use async io on the descriptor and go through the
>> reactor, I'm effectively yielding to the reactor and any other
>> actionable descriptors on each loop as well as allowing my reads and
>> writes to happen simultaneously.
>> 
>> Or am I missing something?
> 
> There could be utility in such, but Twisted has no support for it, 
> largely because actual support on various platforms is still pretty 
> ragged.
> 
> On Linux, you can get the aio_* family of functions, but they're pretty 
> crap.  They have tons of limitations (only block-aligned reads allowed, 
> only a certain number of outstanding operations (system wide) at a time, 
> etc, and the failure mode for not complying with these limitations is 
> that the APIs block).
> 
> It's a bit better on Windows, so someone could probably fashion an 
> extension to iocpreactor for this.  There isn't a lot of developer 
> attention focused on implementing Windows-only extensions right now, 
> though.

In my opinion, the right way to go about something like this would be to come up with an API for asynchronous File I/O in Twisted, implement that API using subprocesses or maybe the reactor threadpool, and then attempt to optimize and simplify it using special platform-speciifc APIs later.  (Important note: do not _expose_ the threaded nature of the code to application code at any point: just deliver the data to something in the reactor thread, to dispatch as it sees fit.)

My impression is that OS-level asynchronous file I/O APIs are fairly raw because, unlike network connectivity, you won't get thousands of connections at once.  If you only have one disk, you can only really get a benefit from two, maybe three file I/O slave processes, and that's a fairly small amount of resources to manage.  Granted, it's tricky to really identify how many "disks" you've got in a system, and the performance characteristics change radically based on what kind of disk technology is involved, but generally speaking a few worker threads and a queue of I/O operations would cover the vast majority of use-cases.


More information about the Twisted-Python mailing list