[Twisted-Python] Streaming HTTP

Mon Dec 7 02:39:09 MST 2015

After having written the following comments, I realized that my thoughts
are only about the high-level interface of Site/Resource. I think those
are the interfaces most users care about, so what it makes most sense to
think deeply about having a painless transiftion for. I suspect that if
people are using lower-level interfaces, they are probably willing to
make more invasive changes in order to be able to take advantage of
HTTP/2; in particular, since they likely have resaon to use HTTP/2
specific features.

> My proposal is to deprecate the current Request/Resource model. It
> currently functions and should continue to function, but as of this
> point we should consider it a bad way to do things, and we should push
> people to move to a fully asynchronous model.

It is probably possible to implement something like you suggest, without
having to change the model too much. As I understand it, the big
impediment to properly handling streaming requests is `.content` (and
some related convenience things like `.args`), and the fact that both
`.getChild` and `.render` are called after `.content` is populated. It
is probably possible to address those issues without changing the shape
of those interfaces (even if we change the names). I know #288 had at
least two suggestions on how to do that. One was to have a marker
interface, to indicate that a given Resource wants new behavior, where
`.content` isn't populated, and the other was to have new methods that
have new behavior, which default to slurping up everything and calling
the old functions.

On the other hand, there might be other stuff that wants cleaning up,
that having a break would be better at addressing; replacing
`NOT_DONE_YET` with deferreds comes to mind.

> We should then move to an API that is much more like the one used by Go: specifically, that by default all requests/responses are streamed. Request objects (and, logically, any other object that handles requests/responses, such as Resource) should be extended to have a chunkReceived method that can be overridden by users. If a user chooses not to override that method, the default implementation would continue to do what is done now (save to a buffer). Once the request/response is complete (marked by receipt of a zero-length chunk, or a frame with END_STREAM set, or when the remaining content-length is 0), request/responseComplete would be called. For users that did not override chunkReceived can now safely access the content buffer: other users can do whatever they see fit. We’d also update requestReceived to ensure that it’s called when all the *headers* are received, rather than waiting for the body.

I haven't thought about this deeply, but my first thought, is that it
would be reasonable to mirror how the client handles streaming
responses. `Agent.request` returns `Response` as soon as the headers
have been received. To get the body of the response, you call
`Response.deliverBody` which takes an `IProtocol` that will receive the
body. There is also a helper `readBody` that wraps that and returns a
deferred that fires with body, once it has been received (and treq also
has ``collect`` that wraps that and calls a function with the bits of
the data).

> A similar approach should be taken with sending data: we should assume that users want to chunk it if they do not provide a content-length. An extreme position to take (and I do) is that this should be sufficiently easy that most users actually *accidentally* end up chunking their data: that is, we do not provide special helpers to set content-length, instead just checking whether that’s a header users actually send, and if they don’t we chunk the data.

Regarding sending data, this is already what we do (at least as long as
the client is speaking HTTP/1.1).

  Tom