[Twisted-Python] Streaming HTTP

Glyph Lefkowitz glyph at twistedmatrix.com
Wed Dec 2 15:39:08 MST 2015


> On Nov 19, 2015, at 3:50 AM, Cory Benfield <cory at lukasa.co.uk> wrote:
> 
> 
>> On 18 Nov 2015, at 12:18, Glyph Lefkowitz <glyph at twistedmatrix.com> wrote:
>> 
> 
> Sorry about the delay in responding to this, but I wanted to make sure I knew at least a bit about what I was talking about before I responded!

Clearly this is a challenging topic that requires lots of thought on the part of each interlocutor, and may require long rounds of consideration before each reply.  No need to apologize.

>>> What do people think of this approach?
>> 
>> So I think you're roughly on the right track but there are probably some Twisted-level gaps to fill in.
>> 
>> I've already gestured in the direction of Tubes (as have others) and it's something to think about.  But before we get to that, let's talk about a much more basic deficiency in the API: although there's an "IRequest", and an "IResource", there's no such thing as an "IResponse".  Instead, "IRequest" stands in for both the request and the response, because you write directly to a request (implicitly filling out its response as you do so).
> 
> So, I think in general this is interesting. One of the big difficulties I’m having right now is that I’m trying to combine this “streaming HTTP” work with the implementation of HTTP/2, which means that I need to keep the HTTP/2 work in mind whenever I talk about this *and* update the HTTP/2 design in response to decisions we make here. This means I’ve got quite a lot of balls in the air right now, and I am confident I’ll drop quite a few. One thing I’m deliberately not doing here is considering Tubes, in part because I’m extremely concerned about backward compatibility, and want the HTTP/2 work to function in the same environment.
> 
> Unfortunately, this means this conversation is blending into the HTTP/2 one, so I’m going to hijack this thread and bring in some concrete discussion of what I’m working on with the HTTP/2 stuff.

Hijack away.  I think we should be primarily concerned with getting HTTP/2 integrated for the moment.  The reason this raises so many concerns related to the streaming stuff is that the internal implementation of HTTP/2 ought to be more amenable to pulling apart to fit into an actually good interface to the HTTP protocol.

I think that twisted._threads points in a promising direction for this sort of work: let's make the old, crappy HTTP APIs work as-is, but with a new, private implementation that is better-factored but not fully documented.  We have the old interface as a proof-of-concept, so the new stuff needs to at least be good enough to be an internal implementation detail for that; we don't have to commit to a new public API to land it, and hopefully with some minor edits we can just make it public as the "good" interface (and then backport HTTP/1.1 over it, since we will probably be dealing with legacy HTTP/1.1 clients and servers until we're all dead).

> I was having a conversation about the HTTP/2 architecture on #twisted-dev yesterday, which has led towards my current working approach for HTTP/2, which will be to have two underlying objects. We’ll have H2Connection, which implements IProtocol, and H2Stream, which implements ITransport. These two objects will be *extremely* tightly coupled: H2Stream cannot meaningfully run over an arbitrary transport mechanism, and knows a great deal about how H2Connections work.

This seems good, except for the "extreme" tight coupling.  IProtocol and ITransport aren't that tightly coupled.  Why do H2Stream and H2Connection need to be?

> The reason we need to take this approach is because IConsumer doesn’t allow for us to have correlators, so even if we only had H2Connection it wouldn’t be able to identify a given producer with the stream it holds. By extension, IConsumer cannot consume multiple producers at once. For this reason, we need an interface between H2Connection and H2Stream that is similar to ITransport and IConsumer, but more featureful. Basically, H2Stream is a thin shim between a producer and H2Connection that adds a stream ID to a few function calls.

This is basically a good pattern.  It exposes a hard-to-screw-up interface to the next layer up, because you can't forget to include a (mandatory) stream ID.  I've implemented several multiplexing things that work more or less like this.

>> Luckily we have an existing interface that might point the way to a better solution, both for requests and responses: specifically, the client IResponse: https://twistedmatrix.com/documents/15.4.0/api/twisted.web.iweb.IResponse.html.
>> 
>> This interface is actually pretty close to what we want for a server IResponse as well.  Perhaps even identical.  Its static data is all exposed as attributes which can be relatively simply inspected, and the way it delivers a streaming response is that it delivers its body to an IProtocol implementation (via .deliverBody(aProtocol)).  This is not quite as graceful as having a .bodyFount() method that returns an IFount from the tubes package; however, the tubes package is still not exactly mature software, so we may not want to block on depending on it.  Importantly though, this delivers all the events you need as a primitive for interfacing with such a high-level interface; it would definitely be better to add this sort of interface Real Soon Now, because then the tubes package could simply have a method, responseToFount (which it will need anyway to work with Agent) that calls deliverBody internally.
>> 
>> This works as a primitive because you have all the hooks you need for flow-control.  This protocol receives, to its 'makeConnection' method, an ITransport which can provide the IProducer https://twistedmatrix.com/documents/15.4.0/api/twisted.internet.interfaces.IProducer.html and IConsumer https://twistedmatrix.com/documents/15.4.0/api/twisted.internet.interfaces.IConsumer.html interfaces for flow-control.  It receives dataReceived to tell it a chunk has arrived and connectionLost to tell it the stream has terminated.
> 
> Just let me clarify how this is expected to work. Somewhere we have a t.w.s.Site, which builds some kind of HTTP protocol (currently HTTPChannel, in future some object that can transparently swap between HTTPChannel and H2Connection) when connections are received.

Another option could also be having a t.w.s.NewSite (with that name hopefully obviously being a straw man) so that Site can simply be deprecated in favor of the new thing.  Making Site itself be able to accommodate the new stuff would be nice but is definitely not mandatory.

> These two protocols each build an IGoodRequest, which is very similar to IRequest but has a deliverBody method. The consumer of this (whether IResource or some other thing). These objects, if they want to consume a stream, register a protocol via deliverBody. At this point, H2Connection (via H2Stream) provides itself as the transport to that protocol, and calls deliverBody when chunks of data are received.

This sounds great.  One thing to maybe watch out for: what if nobody calls deliverBody?  This can sometimes be a little annoying in client code, to debug why a channel is never closed.  Having a nice error in this case would be a cherry on top.

> When the object receiving the request is ready to send a response, it calls…something (sendResponse?) and provides an object implementing a server IResponse. The code in the H2Stream/H2Connection sends the headers, then calls deliverBody on the IResponse, passing H2Connection (again via H2Stream) as the protocol that gets called. In this world, H2Stream actually would need to implement IProtocol as well as ITransport.

A minor bit of critique here: the Single Responsibility Principle <https://en.wikipedia.org/wiki/Single_responsibility_principle> dictates that we ought not to have H2Stream literally implement both IProtocol and ITransport; rather, we should have an _H2StreamProtocol and an _H2StreamTransport, since the thing talking to the IProtocol implementation really ought to be wholly distinct from the thing talking to the ITransport implementation, and this kind of duality makes it very easy for users - especially programmers new to Twisted - to get confused.  As Nathaniel Manista and Augie Fackler put it in The Talk <https://www.youtube.com/watch?v=3MNVP9-hglc>, we want to express ourselves "structurally", if you only want application code to talk to the transport implementation and it's an error to talk to the protocol implementation, pass only the transport implementation.

> Is my understand of that correct? If so, I think this design can work: essentially, H2Stream becomes the weird intermediary layer that appears as both a transport and a protocol to the request/response layer. Underneath the covers it mostly delegates to H2Connection, which implements a slightly weirdo version of IConsumer (and in fact IProducer) that can only be consumed by H2Stream.

I don't quite get why it needs to be slightly weirdo (hopefully IPushProducer is sufficient?) but yes, this all sounds right to me.

> 
>> Unfortunately the client IRequest https://twistedmatrix.com/documents/15.4.0/api/twisted.web.iweb.IClientRequest.html isn't quite as useful (although its relative minimalism should be an inspiration to anyone designing a next-generation IRequest more than the current IRequest's sprawling kitchen-sink aesthetic).  However, IResponse.deliverBody could be applied to IGoodRequest as well.  If we have a very similar-to-IResponse shaped IRequest object, say with 'method', 'uri' and 'headers', and then a 'deliverBody' that delivers the request body in much the same way, we could get a gracefully structured streaming request with works with a lot of existing code within Twisted.
>> 
>> Then the question is: what to do with IResource?
>> 
>> Right now the flow of processing a request is, roughly:
>> 
>> -> wait for full request to arrive
>>  -> have HTTPChannel fill out IRequest object
>> -> look at request.site.resource for the root
>> *-> call getChildWithDefault repeatedly, mutating "cursor" state on the IRequest as you move (specifically: "prepath" and "postpath" attributes)
>>  -> eventually reach the leaf Resource, or one with 'isLeaf' set on it, and delegate producing the response to that resource
>> *-> call resource.render(request)
>> -> examine the return value; if it's bytes, deliver them and close the connection; NOT_DONE_YET, just leave the connection open,
>> 
>> Instead, I think a good flow would be:
> 
> [snip long discussion of how to write locateChild]
> 
> Agreed that these proposed approaches would work well. I have no concrete feedback on them, they seem good to me.
> 
>> -> finally, call .responseForRequest(request) -> IResponse on the final Resource and deliver the IResponse to the network.
>> 
>> The way compatibility could be achieved here is to write a wrapper that would implement .responseForRequest to first collect the entire body, then synthesize a gross old-style-IRequest-like object out of the combination of that body and the other information about the resource, then call .getChildWithDefault on it a few times, then call the old-style .render_GET, et. al.  The IResponse returned from this compatibility .responseForRequest would wrap up calls like request.write and turn them into write() calls.
> 
> This seems super-gross but vaguely do-able, and we’ll need to write it in order to get the new H2Connection/H2Stream objects working with the old paradigm anyway.

"super-gross but vaguely do-able" is what we're shooting for in the compatibility layer :).

> All of this approach sounds reasonable modulo some careful thinking about how exactly we tie this in with the old paradigm. I’m particularly concerned about H2Channel, which I suspect many applications may know a great deal about. Changing its interface is likely to be slightly tricky, but we’ll see how it goes.

It might be useful to think about a parent interface, IHTTPChannel with all the least-common-denominator stuff on it, and sub-interfaces IHTTP1_1Channel and IHTTP2_0Channel which each derive from that and provide additional version-specific stuff.  I don't have enough protocol-specific knowledge to hand in short-term memory to comment on what that functionality might be though.

-glyph

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://twistedmatrix.com/pipermail/twisted-python/attachments/20151202/8cbc1551/attachment-0001.html>


More information about the Twisted-Python mailing list