[Twisted-Python] [Twisted-web] render_GET and memory consumption

Thu Dec 23 23:13:40 MST 2010

Cross-posting to Twisted core list because this is starting to get into general non-web stuff.  Future replies about this fork of the thread should go there.

On Dec 23, 2010, at 11:22 AM, exarkun at twistedmatrix.com wrote:

> On 22 Dec, 06:51 am, glyph at twistedmatrix.com wrote:
>> 
>> On Dec 21, 2010, at 2:24 PM, exarkun at twistedmatrix.com wrote:
>>> Instead, you have to go all the way to producers/consumers, and only 
>>> write more data to the transport buffer when it has finished dealing 
>>> with what you previously gave it.
>> 
>> While everybody should of course use producers and consumers, I feel 
>> like there should be a twisted core ticket for this behavior of 
>> transport buffering, and a twisted web ticket for this behavior of the 
>> request buffering.  The naive implementation _could_ be much cheaper 
>> memory-wise; at the very least, twisted.web.static.Data ought to do the 
>> smart thing.
> 
> Fixing Data sounds like a good idea.  I don't know what improvement to 
> the transport buffering you're thinking of, though.  It doesn't seem 
> like there is an obvious, generally correct fix.

Right now, FileDescriptor.write appends its input data directly to _tempDataBuffer.  So far, so good: no string mangling.

So let's say we do fd.write(header); fd.write(veryBigBody).

Then we start writing.

FileDescriptor.doWrite comes along and notices that the dataBuffer is empty.  The first thing it does in this case: ''.join(_tempDataBuffer); which copies the entire veryBigBody.

FileDescriptor is kinda sorta trying to avoid this problem by maintaining an 'offset' so it doesn't need to re-copy dataBuffer; it could use a similar tactic and do write()s out of individual chunks which are greater than SEND_LIMIT directly, rather than coalescing them together.

Or maybe we could wrap writev() instead of copying stuff at all?