[Twisted-Python] newbie confusion - puzzling reactor response
Glyph Lefkowitz
glyph at twistedmatrix.com
Fri Feb 12 20:58:46 EST 2010
On Feb 12, 2010, at 3:11 PM, exarkun at twistedmatrix.com wrote:
> On 07:35 pm, rich at noir.com wrote:
>>
>> Er... on second thought... isn't there still a utility in asynchronous
>> file io which yields to the reactor?
>>
>> It may be always readable/writable, but if I simply read/write, I'll
>> block the process for as long as that takes, block on read, block on
>> write. Whereas if I use async io on the descriptor and go through the
>> reactor, I'm effectively yielding to the reactor and any other
>> actionable descriptors on each loop as well as allowing my reads and
>> writes to happen simultaneously.
>>
>> Or am I missing something?
>
> There could be utility in such, but Twisted has no support for it,
> largely because actual support on various platforms is still pretty
> ragged.
>
> On Linux, you can get the aio_* family of functions, but they're pretty
> crap. They have tons of limitations (only block-aligned reads allowed,
> only a certain number of outstanding operations (system wide) at a time,
> etc, and the failure mode for not complying with these limitations is
> that the APIs block).
>
> It's a bit better on Windows, so someone could probably fashion an
> extension to iocpreactor for this. There isn't a lot of developer
> attention focused on implementing Windows-only extensions right now,
> though.
In my opinion, the right way to go about something like this would be to come up with an API for asynchronous File I/O in Twisted, implement that API using subprocesses or maybe the reactor threadpool, and then attempt to optimize and simplify it using special platform-speciifc APIs later. (Important note: do not _expose_ the threaded nature of the code to application code at any point: just deliver the data to something in the reactor thread, to dispatch as it sees fit.)
My impression is that OS-level asynchronous file I/O APIs are fairly raw because, unlike network connectivity, you won't get thousands of connections at once. If you only have one disk, you can only really get a benefit from two, maybe three file I/O slave processes, and that's a fairly small amount of resources to manage. Granted, it's tricky to really identify how many "disks" you've got in a system, and the performance characteristics change radically based on what kind of disk technology is involved, but generally speaking a few worker threads and a queue of I/O operations would cover the vast majority of use-cases.
More information about the Twisted-Python
mailing list