[Twisted-Python] HTTP PUT a GET's streaming response with treq

Cory Benfield cory at lukasa.co.uk
Sat May 6 13:09:13 MDT 2017


> On 5 May 2017, at 12:38, Phil Mayers <p.mayers at imperial.ac.uk> wrote:
> 
>> As a third note, your code does not handle the possibility that
>> original._transport may not implement IPushProducer. While *in
> 
> IIUC IResponse.deliverBody *does* guarantee (currently) to provide a transport implementing IPushProducer to the supplied IProtocol, at least as far as the docstring says?
> 
> Obviously that doesn't make accessing the private _transport safe - an implementation could wrap the transport to implement the IPushProducer semantics.

Yeah, it does. Sadly, there’s some complexity hidden there.

In this case, the “transport” object being provided to the IProtocol is a proxy to the TCP transport. However, that object does not write directly to the provided IProtocol, it writes to the t.w._newclient.HTTPClientParser and that chucks all kinds of data through the system. One of these layers is buffering, and that layer is *not* being told about calls to the IPushProducer methods. Thus, if you call pauseProducing from the IProtocol you provide, you may still get calls to your dataReceived method even though the producer is paused.

Mostly this seems to happen when the first body bytes are delivered, which seems to call makeConnection and then immediately dispatch any body bytes that have been received, even if the makeConnection method called pauseProducing on its transport. Not so helpful!

This also makes it impossible to avoid buffering. There seems to be no way to prevent the Response object from delivering the first bytes of the body before startProducing is called by treq for the upload. That’s pretty frustrating. Worse, the Response will *also* fire the “body complete” callback. All of this can and does happen before treq.put ever gets a chance to call startProducing, which means that you just have to buffer all of these events. This is really quite painful.

This is a design pattern that is in a few different places in twisted.web: situations where the “transport” provided to a higher-level abstraction is a proxy to the TCP connection, but where intermediate layers are actually responsible for calling dataReceived on those higher level abstractions. This causes a real problem if there is any chance that intermediary layers will do any buffering at all, because those layers will not see calls to pauseProducing or anything similar, and so will not respect those calls. The Twisted HTTP/2 server tries very hard to avoid this by ensuring that its classes actually masquerade as transports, rather than delegating to the lower-level TCP transport (though of course, they have to do this because HTTP/2 is framed!).

Fixing this is potentially a bit non-trivial. It involves changing t.w._newclient.Response to be able to act as a transport proxy (probably via some kind of mixin for the sake of code composition) and then to respect the producing state of the protocol receiving the body inside the _bodyDataReceived method, instead of just YOLOing data into that protocol assuming that it hasn’t immediately paused the production.

I consider this a bug in Twisted: I’ll open a ticket.

Cory




More information about the Twisted-Python mailing list