[Twisted-Python] Streaming HTTP

Fri Nov 13 05:36:26 MST 2015

Folks,

# Problem Statement

Thanks for your feedback on my HTTP/2 questions. I’ve started work implementing a spike of a HTTP/2 protocol for twisted.web. I’m aiming to have something that works in at least some cases by the end of the day.

As part of my dive into twisted.web, I noticed something that surprised me: it seems to have no support for ‘streaming’ request bodies. By this I mean that the Request.requestReceived() method is not actually called until the complete request body has been received. This is a somewhat unexpected limitation for Twisted: why should I have to wait until the entire body has been uploaded to start doing things with it?

This problem is thrown into sharp relief with HTTP/2, which essentially always chunks the body, even if a content-length is provided. This means that it is now very easy to receive data in delimited chunks, which an implementation may want to have semantic meaning. However, the request is unable to access this data in this way. It also makes it impossible to use a HTTP/2 request/response pair as a long-running communication channel, as we cannot safely call requestReceived until the response is terminated (which also terminates the HTTP/2 stream).

Adi pointed me at a related issue, #6928[0], which itself points at what appears to be an issue tracking exactly this request. That issue is issue #288[1], which is 12 years old(!). This has clearly been a pain point for quite some time.

Issue #6928 has glyph suggesting that we come to the mailing list to discuss this, but the last time it was raised no responses were received[2]. I believe that with HTTP/2 on the horizon, this issue is more acute than it was before, and needs solving if Twisted is going to continue to remain relevant for the web. It should also allow people to build more performant web applications, as they should be able to handle how the data queues up in their apps.

This does not immediately block my HTTP/2 work, so we can take some time and get this right.

# Proposed Solution

To help us move forward, I’m providing a proposal for how I’d solve this problem. This is not necessarily going to be the final approach, but is instead a straw-man we can use to form the basis of a discussion about what the correct fix should be.

My proposal is to deprecate the current Request/Resource model. It currently functions and should continue to function, but as of this point we should consider it a bad way to do things, and we should push people to move to a fully asynchronous model.

We should then move to an API that is much more like the one used by Go: specifically, that by default all requests/responses are streamed. Request objects (and, logically, any other object that handles requests/responses, such as Resource) should be extended to have a chunkReceived method that can be overridden by users. If a user chooses not to override that method, the default implementation would continue to do what is done now (save to a buffer). Once the request/response is complete (marked by receipt of a zero-length chunk, or a frame with END_STREAM set, or when the remaining content-length is 0), request/responseComplete would be called. For users that did not override chunkReceived can now safely access the content buffer: other users can do whatever they see fit. We’d also update requestReceived to ensure that it’s called when all the *headers* are received, rather than waiting for the body.

A similar approach should be taken with sending data: we should assume that users want to chunk it if they do not provide a content-length. An extreme position to take (and I do) is that this should be sufficiently easy that most users actually *accidentally* end up chunking their data: that is, we do not provide special helpers to set content-length, instead just checking whether that’s a header users actually send, and if they don’t we chunk the data.

This logic would make it much easier to work with HTTP/2 *and* with WebSockets, requiring substantially less special-case code to handle the WebSocket upgrade (when the headers are complete, we can spot the upgrade easily).

What do people think of this approach?

Cory

[0]: https://twistedmatrix.com/trac/ticket/6928
[1]: https://twistedmatrix.com/trac/ticket/288
[2]: https://twistedmatrix.com/pipermail/twisted-python/2014-February/028069.html
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 801 bytes
Desc: Message signed with OpenPGP using GPGMail
URL: <http://twistedmatrix.com/pipermail/twisted-python/attachments/20151113/f794d614/attachment.pgp>