[Twisted-Python] dataReveived() buffer best practice?

Glyph glyph at twistedmatrix.com
Fri Oct 7 09:11:35 EDT 2011


On Oct 7, 2011, at 7:47 AM, Phil Mayers wrote:

> On 10/06/2011 10:06 PM, exarkun at twistedmatrix.com wrote:
> 
>>> I also ran a tcpdump to confirm � The opposite server is obviously
>>> pushing
>>> content to the socket in arbitrary frequencies, ending up in my
>>> dataReceived() method to get called arbitrarily as well.
>> 
>> This is not so obvious. Any hop along the route may fragment the data.
> 
> It is quite unusual (though not unheard of) for something to re-segment 
> the TCP stream. IP level fragmentation might occur, but it's relatively 
> uncommon in todays IP networks, and is anyway irrelevant to TCP - the 
> TCP stack will only see a reassembled IP packet.

I am skeptical; my recollection (the last time I worked at this layer of the network) is that this happens all the time over the public internet between diverse endpoints.

However, it's also mostly irrelevant: the thing at issue here is the distinction between the call to send() and the call to recv() - or in Twisted terms, the call to transport.write() and the argument to dataReceived, which will be smashed up into quasi-arbitrary lengths at the very least by your kernel and your router; you can send as much as you want at once via .write() but dataReceived will tend to get called with chunks around 1-2x your path MTU.

But the sizes are also not important.  The point is that TCP is about streams, not packets, and you have to deal with arbitrary chunking if you want your code to work right.  What layer of the network this happens at is not important to your code :).

-glyph


More information about the Twisted-Python mailing list