Opened 3 years ago

Closed 8 months ago

#7460 enhancement closed fixed (fixed)

Add HTTP 2.0 support to twisted.web

Reported by: Alex Gaynor Owned by: Cory Benfield
Priority: normal Milestone:
Component: web Keywords:
Cc: jknight, Cory Benfield, Jason J. W. Williams Branch: branches/h2-7460-3
branch-diff, diff-cov, branch-cov, buildbot
Author: adiroiban

Attachments (3)

7460_1.patch (26.6 KB) - added by Cory Benfield 15 months ago.
First draft patch
7460_2.patch (31.9 KB) - added by Cory Benfield 15 months ago.
Second draft patch
7460_4.patch (44.6 KB) - added by Cory Benfield 13 months ago.
Complete patch

Download all attachments as: .zip

Change History (22)

comment:1 Changed 3 years ago by DefaultCC Plugin

Cc: jknight added

comment:2 Changed 3 years ago by Jean-Paul Calderone

Description: modified (diff)

comment:3 Changed 23 months ago by Cory Benfield

Cc: Cory Benfield added

Hi! I'm the implementer of hyper, a Python HTTP/2 client. I'm open to working on adding HTTP/2 to Twisted, but while I know plenty about HTTP/2 I don't know anything about Twisted's architecture or how it would be best to proceed. It would be good if I could get some input from a current Twisted core about what the ideal structure would look like.

comment:4 Changed 21 months ago by Jason J. W. Williams

Cc: Jason J. W. Williams added

comment:5 Changed 16 months ago by Cory Benfield

Just to keep this up-to-date: I have development time set aside for working on this. Once #7860 lands, I'll start.

Changed 15 months ago by Cory Benfield

Attachment: 7460_1.patch added

First draft patch

comment:6 Changed 15 months ago by Cory Benfield

Keywords: review added

Ok, after some chatting on the mailing list, I've written the first draft of a patch that implements HTTP/2 support.

This is a bit tricky, because HTTP/2 really benefits from a world with streaming HTTP, and Twisted doesn't quite have that in the standard Resource flow. For this reason, I'm trying to build the HTTP/2 support so that it fits with the old Resource-style, but with one eye towards updating it for the future to support streaming. Because of the relative difficulty of making this function, I want lots and lots of review please. We need to make sure we're all on the same page and working towards the same goal.

In the meantime, let's talk about the patch I just uploaded. It contains no unit tests, and probably breaks many of the current twisted.web ones. That's ok, it's not perfect, but it's meant to represent a blueprint for where I'm planning to take this work. Let me break it down a bit.

First, it deals with the fact that t.w.h.Request has a tendency to reach inside the HTTPChannel object to grab the transport underneath. This is a bad idea, quite aside from the Rule of Demeter violation, because it prevents us making abstractions like the one I want to make here. Part of this patch adjusts lots of the logic to call the channel, rather than the transport. In the case of the standard HTTPChannel it just passes things directly through to its underlying transport, but the new H2 classes do clever stuff.

The next thing it does is move header rendering out of the Request class. It's a mystery to me why it was ever there, but it can't stay there for HTTP/2 unless we want to un-parse them, so instead it calls a new writeHeaders method on the channel that does this work. I also had to add an endRequest method to make sure I promptly close the HTTP/2 stream when I'm done with it, but that's a no-op for HTTP/1.1 so it seems pretty safe.

The next big chunk of work is a proxy HTTP channel. This exists so that we can handle the TLS upgrade logic. It basically proxies stuff through to an underlying HTTPChannel until it gets told about a TLS upgrade, at which point it replaces the underlying channel. I worry about the maintainability of this approach, so feedback is particularly welcomed here, but this does seem to work.

Next, there's a small change in twistd. This is to let 'twistd web' work correctly with HTTP/2. Right now the way the patch is constructed everything explodes if you don't have NPN or ALPN available, so if you want to run this on your own box make sure you do! We'll obviously need to improve that...

Finally, there's the new t.w.http2 file. This contains the workings of a HTTP/2 implementation, dividing into a H2Connection and H2Stream. These together implement the interface of HTTPChannel, whilst being extremely tightly coupled together. I'd like some feedback on that part of the design too.

If you have questions, please ask away!

comment:7 Changed 15 months ago by Adi Roiban

Author: adiroiban
Branch: branches/h2-7460

(In [46329]) Branching to h2-7460.

comment:8 Changed 15 months ago by Adi Roiban

Here is my initial, very hight level feedback

I will not comment on style guide related changes for now and focus on the design.

I think that for this review is important to make sure all methods that should or can be private are marked as such. I would like to see as few public things as possible :)


I think that the changes in the Request to not work directly with the transport but with an HTTPChannel are great.

I am worried about the self.transport = self.channel.transport change as I think that this might break end user code, especially since self.transport is not documented so people might use it in any possible way :(

But as a first step, maybe we should start documenting it to describe why and how to use it.

Maybe GenericHTTPChannel can also try to look at self._obj.transport in the getattr setattr and raise warnings if it is called with methods which are not part of the HTTPChannel interface.


The signature of the new writeHeaders methods is different... I think they should have the same signature.

Also, on HTTP1 writeHeaders is also writing the first repsonse line.. not just the headers

Maybe we should have an explicit writeResponseLine call.


the Request.write method is a monster. Please consider extracting at least the content of the if not self.startedWriting: in a separate private method.

Maybe later we can refactor the write method to use something like a state machine.... so that we can just enter a NO_BODY state without having to overwrite the self.write in the current "brutal" way :(


so the new methods on the HTTPChannel can be writeResponseLine, writeHeaders and endRequest


GenericHTTPChannel is more like a ProtoHTTPChannel as it is place before any channel and the H2Channel evolves from it :) ... maybe call it BackwardCompatibleHTTPChannel

I assume that at some point we will have to add more API to the IHTTPChannel to accommodate the flow control functionalities... and this API can not be made backward compatible.

We might end up with 2 Channel and or Request interfaces. The old one which supports HTTP1 and HTTP2 and another one with specific HTTP2 features.


Maybe H2Connection should be renamed to H2Channel to be in sync with the HTTPChannel.

Even if H2Channel is not a great name, I think is better to have HTTPChannel and H2Channel as it should be easier for end users to observer that they are supposed to do the same thing.

Otherwise we will have HTTPChannel, H2Connection and H2Stream and some people might thing that H2Stream as a channel.


The new writeDataToStream public api is only called by the new H2Stream, can we move it to H2Stream and make it private?


In H2Connection can we make all the new requestSOMESTATE into private methods? Why do they need to be public?


For the H2Stream ... it looks it is both a IProtocol and ITranport

instead of stream.site = self.site maybe we can just have stream.setSite(self.site) and have a private site on the stream.

maybe we can rename receiveDataChunk to dataReceived

I find it hard to understand the difference between endRequest and requestDone and why we need both ... and why we need them as public API. Can we use the consumer/producer API to replace them?


All changes will require tests so this can grow quite big. Some suggestions of breaking it:

  • Add the new writeHeaders / endRequest ... and maybe writeResponseLine API.
  • Turning HTTPChannel into a real transport and producer/consumer
  • Add HTTP2 stuff
  • Changes from twisted/web/tap.py to make it a twistd plugin.

We can have the current branch as a reference so that someone will review these tickets will have a context in which to review them and can see the big picture in which those patches should fit.

Maybe you can also look at #6928 and see if we can also introduce an headersReceived API for the requests so that we can have streaming in HTTP1 :)

Many thanks for your work on this.

comment:9 Changed 15 months ago by Cory Benfield

Thanks for the review here Adi! Let me address some of your comments directly without making changes on it yet, just so we're sure we're on the same page.

Maybe we should have an explicit writeResponseLine call.

I expressly disagree on this. The status line (which is what the first line of the response is called) is an artifact of HTTP/1: it has no direct analogue in HTTP/2. In HTTP/2, that information is provided by a single so-called pseudo-header, provided in the header block. This means that calling writeStatusLine would be at best a no-op in HTTP/2, and at worse would hang some information off the H2Stream object waiting for a subsequent call to writeHeaders. This kind of subtle state manipulation is often extremely hard to follow.

Finally, even for HTTP/1.1 we would want to buffer the status line until the headers are ready rather than send it immediately, to avoid the risk of sending tinygrams that contain only the status line. With all of this, I don't think it's sensible to factor this out into a separate method that simply modifies the state of the object.

I am worried about the self.transport = self.channel.transport change as I think that this might break end user code, especially since self.transport is not documented so people might use it in any possible way :(

Yeah, this is a reasonable concern. However, in this case I think we're adjusting the channel so that it must implement ITransport, which means that while it could break it would only break code from users who were assuming that self.transport will always implement ITCPTransport or ISSLTransport. I'd argue that was never a safe assumption to make, but it's possible that we need to be defensive against it anyway.

the Request.write method is a monster. Please consider extracting at least the content of the if not self.startedWriting: in a separate private method.

Yeah, it is. I was trying to resist scope creep by just leaving it alone. If you think it's worth us refactoring as part of this ticket then fair enough.

I assume that at some point we will have to add more API to the IHTTPChannel to accommodate the flow control functionalities... and this API can not be made backward compatible.

I don't think so, actually. I think flow control is not really a user concern for the most part, so we won't expose it. Flow control will be implemented via IConsumer and IProducer: I don't see any reason to add new flow-control APIs when Twisted already has perfectly suitable ones.

Otherwise we will have HTTPChannel, H2Connection and H2Stream and some people might thing that H2Stream as a channel.

Well, this is the problem, you see: H2Stream isn't a channel, but neither is H2Connection. Only the two *together* are a channel. Calling either H2Channel is misleading. Frankly, it's *more* accurate to call H2Stream H2Channel than H2Connection, because at least it has the methods that look like the ones on HTTPChannel (writeHeaders etc.)

The new writeDataToStream public api is only called by the new H2Stream, can we move it to H2Stream and make it private?

No. The purpose of writeDataToStream is to be the hook on which buffering, flow control, and priority are hung. To move those into H2Stream defeats the point of having the two separate objects, which is separation of concerns. H2Stream should be something that implements the HTTPChannel interface. H2Connection is essentially a special, weird, transport that only works for H2Streams, and deliberately has APIs that are only called by H2Stream. That's the intended design here.

In H2Connection can we make all the new requestSOMESTATE into private methods? Why do they need to be public?

These, on the other hand, do not need to be public and can definitely be made private.

For the H2Stream ... it looks it is both a IProtocol and ITranport

H2Stream is very deliberately not an IProtocol. To make H2Stream an IProtocol we'd need to make H2Connection an ITransport, and we can't do that because H2Connection needs to know about stream IDs, which the ITransport interface does not allow.

instead of stream.site = self.site maybe we can just have stream.setSite(self.site) and have a private site on the stream.

That would be nice, but site is a public part of the HTTPChannel API. This is why I have to copy it across at all: Resource objects expect to be able to reach in and grab that information. Presumably, therefore, user code assumes that as well. We can't make the site private.

I find it hard to understand the difference between endRequest and requestDone and why we need both

Yeah, this needs to be made clearer. The reason they're two separate calls is that one, endRequest, is a signal that the response has finished sending. This is a no-op for HTTP/1.1, but for HTTP/2 it requires sending a frame that closes the stream. requestDone, on the other hand, is a legacy from the HTTPChannel API and is a signal to the channel to clean up any state associated with that request. They have unfortunately similar names, it'd be good to fix that up.

The rest of this feedback is really good, and I'll aim to apply it.

Changed 15 months ago by Cory Benfield

Attachment: 7460_2.patch added

Second draft patch

comment:10 Changed 15 months ago by Cory Benfield

Ok, I've uploaded a new patch. This one includes changes to ensure that the tests in twisted.web.test now pass. These changes are pretty much just cosmetic: some areas of t.w.t knew a little too much about how some of the members in the HTTPChannel were structured, so the test fixes that.

I also applied some of Adi's review feedback.

comment:11 Changed 15 months ago by Adi Roiban

Branch: branches/h2-7460branches/h2-7460-2

(In [46346]) Branching to h2-7460-2.

comment:12 Changed 15 months ago by Adi Roiban

I have applied your patch and sent it to buildbots... you can check the results :)

comment:13 Changed 15 months ago by Cory Benfield

Oh of course, it would rather help if I either added the h2 dependency or switched to make it conditional.

comment:14 Changed 15 months ago by Adi Roiban

Branch: branches/h2-7460-2branches/h2-7460-3

(In [46452]) Branching to h2-7460-3.

comment:15 Changed 15 months ago by Adi Roiban

Keywords: review removed
Owner: set to Cory Benfield

I have installed hyper on our builders... but tests still fail.

Please check latest branch.

I have fixed the conflict with master.

There was not much activity on this ticket.

Please submit a new patch on top of latest branch with the API which you would like to see in Twisted, based on latest feedback.

Thanks!

comment:16 Changed 15 months ago by Adi Roiban

Here is another review.

Some tests are green ... so this is good progress :)


_queuedHeaders needs docstring and is this ok ?

self.queuedHeaders = None

stream_id should be streamID :)


Minor comments.

maybe writeHeaders can be renamed to sendHeaders ... the request should only care about the high level operationg of sending the headers to the connection and should not care about low level operations like write to a transport :)

maybe endRequest can be finalizeRequest() or requestDone()... i prefer finalizeRequest()

But just take them as minor comments. Since this is new public API I just want to see if we can find better named.

---

For H2Stream some of the public API just take a stream_id and then use to to send some data.

Can we initialize each stream with a separate H2Connection which will contain the stream_id so that we don't have to pass self.stream_id each time we call something on the H2Connection ?


For the initial merge I would like to have as few public API as possible ... but I think that we can also just make sure the API affecting HTTP1 is sane and for HTTP2 we can announce it as experimental ... and at least until the next release we can update the API

The released in planed in 2 or 3 months so we should have some time to experiment the API and ask for feedback.


Can we have the TAP in a separate ticket for advertising the protocols using acceptableProtocols? as those changes also need tests and this ticket is already big

Can we have just the new API which affect HTTP1 merged and tested in this patch (or in a new one) and merge the HTTP2 support in a separate ticket?

This patch is already quite big (without tests) so I think that we need to find a way to break it.

Thanks!

comment:17 Changed 13 months ago by Cory Benfield

Ok, so per adiroiban's request I'm going to try to split this into multiple patches. However, I'm also going to upload the *complete* patch (minus testing changes etc.) to this issue so that anyone who is interested can grab it and test it out, and so reviewers can see the context for small changes.

With that in mind, I'm uploading draft patch 4, which includes much better support including complete support for HTTP/2 priorities as sent by clients!

Changed 13 months ago by Cory Benfield

Attachment: 7460_4.patch added

Complete patch

comment:18 Changed 13 months ago by Cory Benfield

Alright, I've split the patch apart. For now I believe that the following set of patches covers the full set of work: #8188, #8191, #8193, and #8194. Please follow the individual patches for this work.

comment:19 Changed 8 months ago by Cory Benfield

Resolution: fixed
Status: newclosed

All the split patches have been merged. HTTP/2 server support will land in 16.3.

Thank you all for your work! It's been a long time coming. =)

Note: See TracTickets for help on using tickets.