Opened 10 years ago

Closed 5 years ago

Last modified 4 years ago

#886 enhancement closed fixed (fixed)

Add an HTTP 1.1 client protocol implementation to twisted.web

Reported by: itamarst Owned by:
Priority: high Milestone:
Component: web Keywords: httpclient
Cc: exarkun, jknight, ivank, truekonrads Branch: branches/expressive-http-client-886-4
(diff, github, buildbot, log)
Author: itamar, exarkun Launchpad Bug:

Description (last modified by exarkun)

Twisted Web only includes an HTTP 1.0 client protocol implementation. HTTP 1.1 includes numerous useful features which clients benefit from.

As a first step towards good HTTP 1.1 client support, add a low-level API for issuing HTTP 1.1 requests and parsing HTTP 1.1 responses (this excludes things like timeouts, following redirects, etc). Once this is done, higher-level APIs and more HTTP features can be added based on it.

Attachments (2)

httpclient.py.https_and_host.patch (1.3 KB) - added by ivank 6 years ago.
https support for httpclient.py example. Also, send non-default ports as part of Host header.
t.web._newclient_support_bare_LF.patch (5.8 KB) - added by ivank 6 years ago.
support for bare LF newlines sent by broken servers

Download all attachments as: .zip

Change History (36)

comment:1 Changed 10 years ago by exarkun

getPage should remain as-is.  It should be re-implemented in terms of a more
expressive API which allows streaming and non-body retrieval.

pop3client.py is an example of a reasonable way to do this, I think.

comment:2 Changed 6 years ago by exarkun

  • Author set to exarkun
  • Branch set to branches/expressive-http-client-886

(In [25208]) Branching to 'expressive-http-client-886'

comment:3 Changed 6 years ago by exarkun

  • Priority changed from low to high

A lot of stuff has happened in the branch. There are still a lot of minor things to take care of (mostly boring coding standard stuff) and I'd like to add some higher-level API to the branch before merging it (nothing too hard, just something that won't scare away newbies), but it'd be great to get some feedback about the low-level stuff before proceeding to anything high-level.

The intent is to implement persistent connection support in a subsequent ticket. The goal for this code is just to not make it impossible to do that while still supporting the APIs added in this branch.

James, your feedback in particular would be appreciated. :) At the very least,connection control header detection is wrong. Hints on how to make it (or anything else) right would be helpful.

comment:4 Changed 6 years ago by radix

Couple of pre-review comments:

  1. The example doesn't log errors
  2. The second callback in the example doesn't need to take 'proto', and so the previous callback doesn't need to munge the callback result into (proto, response)
  3. The branch is getting huge (3300+ lines), and it'd be really good if it could be split up.
  4. I don't like 'client' as a method name in high-level-http-client.py (though I expect you were planning to find a better one anyway), but I do like the way these things are composable
  5. I think the fourth argument to Request should be optional. I really don't like seeing random "None"s in code using the API.

comment:5 Changed 6 years ago by exarkun

  • Keywords httpclient added

comment:6 Changed 6 years ago by exarkun

  • Branch changed from branches/expressive-http-client-886 to branches/expressive-http-client-886-2

(In [25912]) Branching to 'expressive-http-client-886-2'

comment:7 Changed 6 years ago by exarkun

  • Owner changed from jknight to exarkun
  • Status changed from new to assigned

comment:8 Changed 6 years ago by exarkun

(In [25948]) Merge identity-transfer-encoding-3605

Author: exarkun, itamar
Reviewer: therve
Fixes: #3605
Refs: #886

Add an identity transfer-encoding decoder, similar to the chunked transfer-encoding
decoder, to twisted.web.http and use it to simplify HTTPChannel slightly by
removing the knowledge of any particular transfer-encoding from its request body
handling code.

Also expand the transfer-encoding decoder interface so that it will be useful for
an HTTP client and rename the chunked decoder so that its name more closely reflects
what it does.

comment:9 Changed 6 years ago by exarkun

  • Branch changed from branches/expressive-http-client-886-2 to branches/expressive-http-client-886-3

(In [25973]) Branching to 'expressive-http-client-886-3'

comment:10 Changed 6 years ago by exarkun

  • Description modified (diff)
  • Summary changed from Rewrite twisted.web.client.getPage to support streaming of result data and returning headers etc. to Add an HTTP 1.1 client protocol implementation to twisted.web

As the first comment says, getPage is not actually going to change. Updating the description (empty) and summary of this ticket to describe what change is actually being made.

comment:11 Changed 6 years ago by exarkun

  • Keywords review added
  • Owner exarkun deleted
  • Status changed from assigned to new

Okay. Low-level protocol implementation done (ha ha).

It's all private still. The plan is to implement a higher-level API based on this as soon as this is OKed, then make whichever low-level APIs are suitable for end users public. This is so that we prove that the low-level APIs are good enough for the high-level APIs we want to provide before offering them for public consumption.

comment:12 follow-up: Changed 6 years ago by ivank

doc/web/examples/http_client.py parses the port, but does not send it as part of the Host header (required when port # does not match the default)

comment:13 in reply to: ↑ 12 Changed 6 years ago by ivank

It would be nice if doc/web/examples/http_client.py supported https. The easy information on how to do this was lost when the high-level-http-client.py sketch was deleted.

Changed 6 years ago by ivank

https support for httpclient.py example. Also, send non-default ports as part of Host header.

comment:14 Changed 6 years ago by glyph

  • Owner set to glyph
  • Status changed from new to assigned

comment:15 follow-up: Changed 6 years ago by glyph

  • Keywords review removed
  • Owner changed from glyph to exarkun
  • Status changed from assigned to new

OK. Awesome. Really glad to see this is finally in review!

Please note that the implementation itself really looks great, I like the idiom of create a request / issue the request / build a response. I think it's a solid low level interface. Most of my concerns are about process, documentation, and naming. So this might look like a lot of text, but I don't think it will actually be a lot of work.

Feedback actually relevant to this branch:

  1. I know that the ticket description says "low level", but I like to think that when a ticket is filed against Twisted for some library functionality, it's implicit that some level of that functionality needs to be public before the ticket is closed. While I like new private implementation guts, I don't like that this change is entirely a new private API. The fact that lots of attributes and classes in here are double-private and triple-private suggests that you've given some thought to what should be exposed and what shouldn't.
    1. I mean, twisted.web._newclient._LenthEnforcingConsumer._length? Sure you've got enough underscores on that? Every double-private top-level should drop its underscore. (I can understand the desire for private attributes of private classes though, so they can implement an interface and not "leak" anything else into the public API indirectly.)
    2. As a more concrete suggestion, I think the following things should be public (although I may have missed some necessary piece):
      1. All the exceptions should be in twisted.web.error.
      2. Request and Response should be moved to twisted.web.client.
      3. HTTP11ClientProtocol should be moved to twisted.web.client. I'm inclined to suggest that it should have a better name, but I don't think that's strictly necessary, given that it is really supposed to be the lower level of something else.
    3. Once these are public the example shouldn't be using any private APIs. Which is good, because we should really never have any examples that use private APIs. They are, almost by definintion, bad examples.
  2. The changes in twisted.internet.abstract shouldn't be merged, for obvious reasons.
  3. _makeFunction will create functions that don't appear in the API documentation. Also: worst name ever. May I suggest something more along the lines of (spelled python2.3-compatibly, of course):
        @stateful
        def connectionLost(self, reason):
            """
            The underlying transport went away.  If appropriate, notify the parser
            object.
            """
    
    I believe this preserves the docstrings?
  4. IStreamingProducer is a name that seems designed to be confusing. I don't know why IPushProducer wasn't named IStreamingProducer in the first place, given that the flag's name to indicate it is "streaming". Not that I think that is a great name either, but as usual consistency is more important than meaningfulness. A suggestion for this new interface: IEntityProducer. Maybe even IHTTPEntityBodyProducer since the length is kind of an HTTP feature?
  5. Why bother having UNKNOWN_LENGTH as a distinct value? Wouldn't None serve this purpose adequately? Or having the attribute be optional? You still have to explicitly check, after all. It seems to bloat the code without providing any real benefits. If we keep it, UNKNOWN_LENGTH should have a @var in the module docstring.
  6. The last line of newclient.py, i.e. Request._connectionLost_CONNECTION_LOST, appears to be uncovered by any test. Seems like possibly it should just be deleted, since that error condition is wacky enough that getting a "no such method" exception would be okay.
  7. This code is a bit more comment-heavy than I'm used to, which I was surprised to discover is pretty good. HTTP sucks a lot, and it's nice to have a guide to some of the more absurd pieces of the implementation inline. However:
    1. Proxy-Connection should be documented a bit more thoroughly than "undocumented HTTP 1.0 abomination". At least put in a link to someone else who implements it, or explain briefly what they use it for.
    2. HTTPParser.isConnectionControlHeader should just include the comment in its docstring.
    3. The giant enumerated outline in the middle of Request._writeToContentLength should be broken up to put each bullet point closer to the responsible code, and converted into full sentences, or possibly just removed.
  8. The tests sure do say Response(None, None, None, None, transport) a lot. If that's a valid thing to say, perhaps it should be more convenient? Response(transport=transport)? If it's a test-only thing, a function in the test module would be good (with a docstring explaining why it's OK to leave all those values as None, since I'm not sure I understand that bit).
  9. None of the _blah_BLAH state-machine methods are documented. Which I actually think I could be OK with, except... none of the BLAH states are actually documented anywhere either. For now, let's just stick to our existing standard and document all these methods. Notwithstanding the difficulty with constants mentioned below, I'd really like it if TRANSMITTING, for example, had a docstring of its own.
  10. HTTP11ClientProtocol._finish is also undocumented, and the antecedent in its "XXX" comment is ambiguous. (Thanks to the link for #3420 though.)

Idle musings / reminders of persistent problems:

  1. New examples are great. But every time I see a new example that uses reactor.run(), I feel it necessary to point out the thing that I pointed out on my comment to #1490. Nothing this branch can do, I guess, just a reminder.
  2. If we're keeping UNKNOWN_LENGTH, object() is an opaque way to make a constant value. I really wish we had some convenient idiom for making constants that would repr() (and ideally qual() as well) to their FQPN. Inspecting UNKNOWN_LENGTH in a debugger, or a failed test, like inspecting any other Python constant, is going to be mysterious and unpleasant. How would you feel about filing a ticket for a good NamedConstant class? It seems the more protocols the implement, the more constants like this that we have. Anyway, as a more practical solution I suggest just making it a string ("twisted.web.iweb.UNKNOWN_LENGTH") for the time being.
  3. The only pyflakes warning from any changed file is IUsernameDigestHash. Not your fault, I know, but could you perhaps just add a blank line that says "IUsernameDigestHash # export in __all__ below" and take it to zero?
  4. We should really have one good, public implementation of a state-machine helper somewhere. How many more times are we going to re-implement vertex.statemachine and epsilon.modal before we settle down and decide on one that works?

comment:16 in reply to: ↑ 15 ; follow-up: Changed 6 years ago by exarkun

  • Author changed from exarkun to exarkun, itamar
  • Keywords review added
  • Owner changed from exarkun to glyph

Replying to glyph:

OK. Awesome. Really glad to see this is finally in review!

Please note that the implementation itself really looks great, I like the idiom of create a request / issue the request / build a response. I think it's a solid low level interface. Most of my concerns are about process, documentation, and naming. So this might look like a lot of text, but I don't think it will actually be a lot of work.

Feedback actually relevant to this branch:

  1. I know that the ticket description says "low level", but I like to think that when a ticket is filed against Twisted for some library functionality, it's implicit that some level of that functionality needs to be public before the ticket is closed. While I like new private implementation guts, I don't like that this change is entirely a new private API. The fact that lots of attributes and classes in here are double-private and triple-private suggests that you've given some thought to what should be exposed and what shouldn't.

I agree with you in general. Some functionality needs to be public. I mentioned the specific motivation behind what is happening in this branch in the comment where I requested a review, though. Can you argue against that plan in particular?

  1. I mean, twisted.web._newclient._LenthEnforcingConsumer._length? Sure you've got enough underscores on that? Every double-private top-level should drop its underscore. (I can understand the desire for private attributes of private classes though, so they can implement an interface and not "leak" anything else into the public API indirectly.)

You're right. The middle underscore isn't adding anything. I fixed most (not _WrappedException (yet?) since I may end up moving its definition to another module, depending on what happens for some other points) instances of this pattern in r25999, r26000, and r26001.

  1. As a more concrete suggestion, I think the following things should be public (although I may have missed some necessary piece):
    1. All the exceptions should be in twisted.web.error.
    2. Request and Response should be moved to twisted.web.client.
    3. HTTP11ClientProtocol should be moved to twisted.web.client. I'm inclined to suggest that it should have a better name, but I don't think that's strictly necessary, given that it is really supposed to be the lower level of something else.
  2. Once these are public the example shouldn't be using any private APIs. Which is good, because we should really never have any examples that use private APIs. They are, almost by definintion, bad examples.

Deferring dealing with any of these points until you respond to my question about the first point.

  1. The changes in twisted.internet.abstract shouldn't be merged, for obvious reasons.

Oh, indeed. Fixed in r26002.

  1. _makeFunction will create functions that don't appear in the API documentation. Also: worst name ever. May I suggest something more along the lines of (spelled python2.3-compatibly, of course):
        @stateful
        def connectionLost(self, reason):
            """
            The underlying transport went away.  If appropriate, notify the parser
            object.
            """
    
    I believe this preserves the docstrings?

Fixed in r26003.

  1. IStreamingProducer is a name that seems designed to be confusing. I don't know why IPushProducer wasn't named IStreamingProducer in the first place, given that the flag's name to indicate it is "streaming". Not that I think that is a great name either, but as usual consistency is more important than meaningfulness. A suggestion for this new interface: IEntityProducer. Maybe even IHTTPEntityBodyProducer since the length is kind of an HTTP feature?

I renamed it to IEntityBodyProducer in r26004. I think HTTP is already implied by its location (the other interfaces in iweb.py don't have HTTP in their names).

  1. Why bother having UNKNOWN_LENGTH as a distinct value? Wouldn't None serve this purpose adequately? Or having the attribute be optional? You still have to explicitly check, after all. It seems to bloat the code without providing any real benefits. If we keep it, UNKNOWN_LENGTH should have a @var in the module docstring.

Added a @var in r26005. We chose to use UNKNOWN_LENGTH rather than None in order to attempt to prevent bugs in applications of the form if response.length: ... (note that UNKNOWN_LENGTH is used both as a possible value for Response.length and for IEntityBodyProducer.length). A response with length of 0 is of course significantly different from a response with an unknown length.

This rather directly raises a question which I've been pondering all morning though - why is a request body produced with an IEntityBodyProducer but a response body is made available through the Response.deliverBody mechanism; it seems as though Response could have an attribute which is an IEntityBodyProducer provider. Maybe I'll remember the answer by the time I finish responding to review feedback.1

  1. The last line of newclient.py, i.e. Request._connectionLost_CONNECTION_LOST, appears to be uncovered by any test. Seems like possibly it should just be deleted, since that error condition is wacky enough that getting a "no such method" exception would be okay.

I didn't want to just have an AttributeError (particularly one exposing an irrelevant implementation detail like how connectionLost is made to do the right thing depending on what state the protocol is in), but your suggestion made me realize I could do better error handling in makeStateDispatcher (previously _makeFunction). So I added that and deleted the untested _connectionLost_CONNECTION_LOST in r26006.

  1. This code is a bit more comment-heavy than I'm used to, which I was surprised to discover is pretty good. HTTP sucks a lot, and it's nice to have a guide to some of the more absurd pieces of the implementation inline. However:
    1. Proxy-Connection should be documented a bit more thoroughly than "undocumented HTTP 1.0 abomination". At least put in a link to someone else who implements it, or explain briefly what they use it for.
    2. HTTPParser.isConnectionControlHeader should just include the comment in its docstring.
    3. The giant enumerated outline in the middle of Request._writeToContentLength should be broken up to put each bullet point closer to the responsible code, and converted into full sentences, or possibly just removed.

Fixed these issues in r26007.

  1. The tests sure do say Response(None, None, None, None, transport) a lot. If that's a valid thing to say, perhaps it should be more convenient? Response(transport=transport)? If it's a test-only thing, a function in the test module would be good (with a docstring explaining why it's OK to leave all those values as None, since I'm not sure I understand that bit).

Did something in r26008, let me know if that's sufficient.

  1. None of the _blah_BLAH state-machine methods are documented. Which I actually think I could be OK with, except... none of the BLAH states are actually documented anywhere either. For now, let's just stick to our existing standard and document all these methods. Notwithstanding the difficulty with constants mentioned below, I'd really like it if TRANSMITTING, for example, had a docstring of its own.

I documented the states that Response and HTTP11ClientProtocol can be in and I added docstrings for all the state methods in r26009. In doing this, I also noticed that none of HTTP11ClientProtocol's attributes were documented, so I documented them in r26010.

  1. HTTP11ClientProtocol._finish is also undocumented, and the antecedent in its "XXX" comment is ambiguous. (Thanks to the link for #3420 though.)

Done in r26011 and r26012.

Idle musings / reminders of persistent problems:

  1. New examples are great. But every time I see a new example that uses reactor.run(), I feel it necessary to point out the thing that I pointed out on my comment to #1490. Nothing this branch can do, I guess, just a reminder.

Yea. #3270 is also somewhat related. Do you want to have a fight with Chris about it? react.py is as far as I could push that issue. Hm, actually, I see that Chris didn't voice his objections on the ticket, he must have done it in meatspace, which means I can pretend he didn't, and maybe push the react solution forward.

  1. If we're keeping UNKNOWN_LENGTH, object() is an opaque way to make a constant value. I really wish we had some convenient idiom for making constants that would repr() (and ideally qual() as well) to their FQPN. Inspecting UNKNOWN_LENGTH in a debugger, or a failed test, like inspecting any other Python constant, is going to be mysterious and unpleasant. How would you feel about filing a ticket for a good NamedConstant class? It seems the more protocols the implement, the more constants like this that we have. Anyway, as a more practical solution I suggest just making it a string ("twisted.web.iweb.UNKNOWN_LENGTH") for the time being.

I don't feel very good about adding this kind of thing to Twisted. Perhaps it will come to that (we probably should have hashed this out long ago, oh well), but it's such a simple thing, and has basically nothing to do with Twisted... Why isn't it in the stdlib, or why isn't there a widespread third-party package that provides the functionality?

Switching to a string (well, I'm going to use unicode actually, ha ha) for now seems like a good idea (r26013).

  1. The only pyflakes warning from any changed file is IUsernameDigestHash. Not your fault, I know, but could you perhaps just add a blank line that says "IUsernameDigestHash # export in __all__ below" and take it to zero?

I really dislike hacking around pyflakes shortcomings like that. I'd rather add __all__ support to pyflakes.

  1. We should really have one good, public implementation of a state-machine helper somewhere. How many more times are we going to re-implement vertex.statemachine and epsilon.modal before we settle down and decide on one that works?

Good question. I'm getting close to the point where I want to write one and put it in Twisted. I'm not quite sure what exact features it'll have yet, though. Some stuff from statemachine is nice, some stuff from modal is nice, there's probably useful features that neither of them provides...

I'm re-assigning this to you so you can respond to my response to your point one. After that's dealt with, I wouldn't mind if it went back to the general review pool.

1: No, I didn't remember.

comment:17 in reply to: ↑ 16 Changed 6 years ago by glyph

  • Owner glyph deleted

Replying to exarkun:

I agree with you in general. Some functionality needs to be public. I mentioned the specific motivation behind what is happening in this branch in the comment where I requested a review, though. Can you argue against that plan in particular?

So, I have a bunch of reasons why it's a bad idea. I'll list them here, but I've considered it and I think that (aside from removing the example from doc/examples, and putting it somewhere private itself) I am willing to get this merged as all private. If you find yourself persuaded by my arguments, then we can change the plan, but if not, I also think it would be valuable to try experimenting with this one-layer-at-a-time merging strategy. I'm not entirely convinced by my own arguments.

Trunk should always be in a releasable state. So I'm assuming that a release may happen at any time; specifically, after this branch is merged but before the high-level API is merged.

If we release private functionality which is better than the public functionality, (especially if it is replete with examples of how to use the private functionality), it strikes me as likely that users will disregard the underscore and start using it. And who could blame them? When python-core releases a non-functional public API with some juicy private bits that we want to use, we use them anyway and are upset when they break them. (Although granted, it would be nicer if they had a better definition of "private", so we could tell in advance.) I think that having a compatibility policy that works depends on having the kind of credibility with users that comes from not prevaricating too much.

In other words, let's not just replace making too much stuff public with making too much stuff private.

Your motivation for the low-level-then-high-level plan seems to be keeping the branches (and therefore the reviews) smaller, by breaking them into layers. All things being equal, that would certainly be better. But, all things are rarely equal: by splitting the feature up, the nature of the review changes. Large as this branch was, I would personally have preferred it be even larger but included a top-level starting point more polished than the example. When I do reviews of new features I like to start reading at the "top".

My perspective on the structure and necessity of the implementation details is directly influenced by the way in which they are used by the high-level API. For example, in its current form the behavior of HTTP11ClientProtocol.request when not in the QUIESCENT state makes sense. But my opinion might be influenced by seeing what the request-queuing implementation implemented above that might look like.

This might also be an argument for keeping the present structure private (rather than making HTTP11ClientProtocol public) when merging this branch, but it seems to make the shorter reviews a false economy. If I'm going to have to re-review these implementation details again later once I've seen them in more serious use, why bother having a separate branch?

Clearly this branch needed a high-level design to go with it, hence the structure in the example, but the high-level design is only partially included. The example is cheating. It has no tests, it has no documentation, and it's not clear how features that it leaves out (like handling redirects, or translating "error" status codes into errors) will be implemented.

Again, all that said, this is just a suggestion.

I think I might have some time later for a re-review, but I'll put this back into the general pool to avoid blocking it further.

comment:18 Changed 6 years ago by glyph

Well, one of my points seems to have been invalidated already. As heard on IRC:

<ivan> it's already deployed to production

so perhaps keeping it out of trunk isn't going to help :).

comment:19 Changed 6 years ago by glyph

  • Owner set to glyph
  • Status changed from new to assigned

Reviewing again.

comment:20 follow-up: Changed 6 years ago by glyph

  • Keywords review removed
  • Owner changed from glyph to exarkun
  • Status changed from assigned to new

Replying to exarkun:

Since this is a pretty long comment, which is turning into a meandering discussion, I'm putting all the mandatory review stuff up here at the front. If you get bored you are free to ignore the rest.

These are all docstring or minor issues, so you can merge once they're dealt with.

  1. twisted.web._newclient.HTTPParser.statusReceived isn't valid epytext. I believe this is because of the "\r\n"; I've run into this problem before, I'd recommend spelling it "CRLF" in docstrings.
  2. the following reference (link) names couldn't be resolved by pydoctor:
    1. in _newclient:
      1. IProtocol
      2. IPushProducer (I think you mean IPushProducer)
      3. IResponse
      4. There also seems to be a bug for mwh, not you: pydoctor can't find "twisted.web.iweb.UNKNOWN_LENGTH", despite the ivar declaration.
    2. in iweb:
      1. IConsumer
      2. Deferred
      3. IProducer
      4. Failure
  3. _WrapperException needs a docstring, whether or not it remains double-private.
  4. makeStateDispatcher should really have a docstring.
  5. undocumented in test_newclient:
    1. AccumulatingProtocol (isn't there one of these somewhere that you can use? let's not make #3604 harder to fix.)
    2. SlowRequest
    3. SimpleRequest
    4. StringProducer
    5. RequestTests
    6. "mumble". (Perhaps "lol" is not a useful comment, either.)
    7. Assertion methods. (It would be more consistent for these to be on a class, but I don't think that's too important.)
      1. assertResponseFailed
      2. assertRequestGenerationFailed
      3. assertRequestTransmissionFailed
      4. HTTPClientParserTests._noBodyTest

That's it for the real meat of the review. Now for the optional stuff and further rumination on loosely-related ideas:

Replying to glyph:

OK. Awesome. Really glad to see this is finally in review!

I agree with you in general. Some functionality needs to be public. I mentioned the specific motivation behind what is happening in this branch in the comment where I requested a review, though. Can you argue against that plan in particular?

The abovementioned comment on IRC was enough to convince me, at least, that the best course of action at this point is to complete the review cycle as quickly as possible, in whatever form.

I still stand by my earlier suggestion, in point 1.2 above, that some elements of this should be public — but this is not out of any procedural concern. I just have a fairly high level of confidence in the API as we've discussed it, and the implementation seems high-quality. The sketch adequately demonstrates to me that this module, or at least the parts that I propose become public, are suitable for a real substrate.

So feel free to make them public, or not.

You're right. The middle underscore isn't adding anything. I fixed most (not _WrappedException (yet?) since I may end up moving its definition to another module, depending on what happens for some other points) instances of this pattern in r25999, r26000, and r26001.

This begs an interesting question, which isn't really relevant to this review: if _WrappedException is private, then there's no documentation nor any supported way to create a ResponseFailed, RequestTransmissionFailed, etc. The duplicate documentation of @ivar reasons is a nice bit of attention to detail, but somewhat redundant.

  1. _makeFunction will create functions that don't appear in the API documentation. Also: worst name ever. May I suggest something more along the lines of (spelled python2.3-compatibly, of course):

Fixed in r26003.

Better. I have some comments for using this more broadly, but these should be considered as comments on a potential public implementation of this; the ad-hoc version here can remain as-is:

  1. The "name" argument is both redundant and misleading. It should really just inspect its function argument and use that name. Why does the name sometimes differ and sometimes not? As I was writing this I realized that it is written that way to consistently make the implementations private without making the names necessarily private, but it's a bit of a mystery at first. When I saw 'def _connectionLost_QUIESCENT' my impulse was to grep for 'def _connectionLost', after having seen 'def _bodyDataReceived_INITIAL' and 'def _bodyDataReceived'.
  2. Decorators should be written with an eye to @-syntax compatibility. We might not use it, but it would be annoying for an external user to have to define a new wrapper function just so that they could. Also, it's hard enough puzzling out the equivalence of '@foo def bar' and 'def bar ... bar = foo(bar)'; adding extra arguments makes it even harder to read.
  3. I also can't help but note that epsilon.modal addresses all of these issues; the implementations don't need to be private, the names are less ugly, it has a really convenient syntax without being a decorator, etc. It bothers me that it's apparently easier to do something crappy over and over and over again than to roll in code that we've already written from somewhere else.
  1. Why bother having UNKNOWN_LENGTH as a distinct value? Wouldn't None serve this purpose adequately? Or having the attribute be optional? You still have to explicitly check, after all. It seems to bloat the code without providing any real benefits.

We chose to use UNKNOWN_LENGTH rather than None in order to attempt to prevent bugs in applications of the form if response.length: ...

I am not sure what kind of bug you mean. I mean, I can come up with a class of bugs in applications like that, but in order to prevent them you'd need a __nonzero__ method that raises an exception. (Although, actually, that's kind of a cool idea.) As it is, it just flips the polarity from silently failing your test to silently succeeding it. UNKNOWN_LENGTH can be 0-length too, after all, if all you get is one chunk.

This rather directly raises a question which I've been pondering all morning though - why is a request body produced with an IEntityBodyProducer but a response body is made available through the Response.deliverBody mechanism; it seems as though Response could have an attribute which is an IEntityBodyProducer provider.

Indeed, it would be good if these were consistent. Since you can't remember why it's this way, can you fix it?

  1. The tests sure do say Response(None, None, None, None, transport) a lot.

Did something in r26008, let me know if that's sufficient.

Better than sufficient. Reading the tests with that change I feel like I understand this code better :).

Idle musings / reminders of persistent problems:

  1. New examples are great. But every time I see a new example that uses reactor.run(), I feel it necessary to point out the thing that I pointed out on my comment to #1490. Nothing this branch can do, I guess, just a reminder.

Yea. #3270 is also somewhat related. Do you want to have a fight with Chris about it? react.py is as far as I could push that issue. Hm, actually, I see that Chris didn't voice his objections on the ticket, he must have done it in meatspace, which means I can pretend he didn't, and maybe push the react solution forward.

I'd prefer twistd run to react for examples, since react will still produce misleading behavior where newbies think they need to do something special to call listenTCP twice. I'll probably take your side in a fight against Chris over that though, react is an improvement. (react() will probably be more useful for simple command-line tools or client applications, too).

... I really wish we had some convenient idiom for making constants that would repr() (and ideally qual() as well) to their FQPN. ...

I don't feel very good about adding this kind of thing to Twisted.

Why not? Twisted has lots of utilities in twisted.python that just do stdlib-type stuff better than the stdlib, and for the most part I love them. Sure, some of them were ill-advised, but they haven't really hurt us. (i.e. how much time have you spent maintaining twisted.python.hook this year?)

Perhaps it will come to that (we probably should have hashed this out long ago, oh well), but it's such a simple thing, and has basically nothing to do with Twisted... Why isn't it in the stdlib, or why isn't there a widespread third-party package that provides the functionality?

I'd certainly rather add a dozen more little utilities like this to Twisted than try to popularize a dozen widespread third-party packages, or live with the quality-control insanity of trying to integrate a dozen versions of a dozen third-party packages into our already onerous buildbot setup. (Not to mention that then we'd need to write a working, portable package manager for Python so that users can install Twisted's two-dozen dependencies without easy_install deleting their root partition or whatever.)

Switching to a string (well, I'm going to use unicode actually, ha ha) for now seems like a good idea (r26013).

Speaking of "ha ha", sure you don't want to switch to something that raises an exception on __nonzero__?

  1. The only pyflakes warning from any changed file is IUsernameDigestHash. Not your fault, I know, but could you perhaps just add a blank line that says "IUsernameDigestHash # export in __all__ below" and take it to zero?

I really dislike hacking around pyflakes shortcomings like that. I'd rather add __all__ support to pyflakes.

Le mieux est l'ennemi du bien.

I noticed that there wasn't a Divmod ticket for this, so I filed one. Be my guest and fix pyflakes, but in the meanwhile it's a lot easier to read pyflakes' output if I don't have to manually verify the accidental exports.

Actually, I just realized that one the main reason I haven't agitated for a pyflakes fix is that I actually like the pyflakes workaround, because it's also a Python workaround. It's easy to miss the __all__ entry down below; nothing will warn you if the name stops being defined, and the API is thusly incompatibly changed. But if you delete the import and you're not using pyflakes, the name reference below will blow up in your tests, right next to a comment that tells you the name is being explicitly exported.

Of course you know how I actually wish this worked.

comment:21 Changed 6 years ago by itamar

Re why response body comes as protocol, not IEntityBodyProducer: IEntityBodyProducer is still kinda crap. Users would need to support yet another new, over-complex consumer API (see the fun code needed to deal with request producers!), which hopefully we will eventually replace anyway with better version... So why not use API that is equivalent, much simpler, and that users already know: a protocol with a transport? Bleh. And now I've used up half my daily typing allowance :)

comment:22 in reply to: ↑ 20 Changed 6 years ago by exarkun

Replying to glyph:

  1. twisted.web._newclient.HTTPParser.statusReceived isn't valid epytext. I believe this is because of the "\r\n"; I've run into this problem before, I'd recommend spelling it "CRLF" in docstrings.

Fixed in r26052.

  1. the following reference (link) names couldn't be resolved by pydoctor:
    1. in _newclient:
      1. IProtocol
      2. IPushProducer (I think you mean IPushProducer)
      3. IResponse
      4. There also seems to be a bug for mwh, not you: pydoctor can't find "twisted.web.iweb.UNKNOWN_LENGTH", despite the ivar declaration.
    2. in iweb:
      1. IConsumer
      2. Deferred
      3. IProducer
      4. Failure

Hum. Indeed. I am rather dissatisfied with the proper spelling for these links. I think it makes the unprocessed docstrings rather harder to read. I'd like to talk to mwh about making pydoctor able to resolve them as-is or with some approach other than the one that epydoc/epytext requires. I've not done anything about them yet.

  1. _WrapperException needs a docstring, whether or not it remains double-private.

Fixed in r26053

  1. makeStateDispatcher should really have a docstring.

Fixed in r26054

  1. undocumented in test_newclient:
    1. AccumulatingProtocol (isn't there one of these somewhere that you can use? let's not make #3604 harder to fix.)

Fixed in r26055

  1. SlowRequest
  2. SimpleRequest
  3. StringProducer

Fixed in r26056

  1. RequestTests

Erm? This one has a docstring (albeit a short one which reveals very little useful information...)

  1. "mumble". (Perhaps "lol" is not a useful comment, either.)
  2. Assertion methods. (It would be more consistent for these to be on a class, but I don't think that's too important.)
    1. assertResponseFailed
    2. assertRequestGenerationFailed
    3. assertRequestTransmissionFailed
    4. HTTPClientParserTests._noBodyTest

Fixed in r26057.

So this leaves only the pydoctor/epytext issues.


That's it for the real meat of the review. Now for the optional stuff and further rumination on loosely-related ideas:

Replying to glyph:

OK. Awesome. Really glad to see this is finally in review!

I agree with you in general. Some functionality needs to be public. I mentioned the specific motivation behind what is happening in this branch in the comment where I requested a review, though. Can you argue against that plan in particular?

The abovementioned comment on IRC was enough to convince me, at least, that the best course of action at this point is to complete the review cycle as quickly as possible, in whatever form.

I still stand by my earlier suggestion, in point 1.2 above, that some elements of this should be public — but this is not out of any procedural concern. I just have a fairly high level of confidence in the API as we've discussed it, and the implementation seems high-quality. The sketch adequately demonstrates to me that this module, or at least the parts that I propose become public, are suitable for a real substrate.

So feel free to make them public, or not.

Okay. I think actually I'm going to try the stacked branch thing you mentioned in a previous comment. After the higher-level stuff is done, then some of the lower-level stuff can probably become public. And then the two branches can be merged at almost the same time.

[snip - there should be a good way to define state machines]

Yep, I agree.

[snip - UNKNOWN_LENGTH.nonzero should raise or something]

Actually, even if the response body for an UNKNOWN_LENGTH response ends up being 0 bytes, you still have to treat the response object as you would for a known non-zero length response. In either case, you have to ask the response to produce the body to a protocol. So, while I wouldn't write it myself, if response.length:´ would let you detect the case where you need to call deliverBody. if response.length is UNKNOWN_LENGTH or response.length: is a nicer way to write it, but either works. So I ''could'' make UNKNOWN_LENGTH.nonzero` raise and force people to write the nicer version, but should I actually?

[snip - replace protocol usage with another IEntityBodyProducer]

I hope itamar's replied clarified this situation. Really, we should just add a non-shitty IProducer/IConsumer replacement. :/ I think I'm going to leave this alone for now, though.

[snip - add a good const/enum/whatever thing to Twisted]

Eh yea okay, I guess we'll have to.

[snip - pyflakes workaround]

Okay, added in r26058.

comment:23 Changed 6 years ago by ivank

Some servers like http://news.ycombinator.com/ use only \n to delimit status lines and headers, instead of \r\n. All not-ridiculously-obscure web browsers can parse this. _newclient.py revision 26111 is unable to parse any header from this page (why is raising AttributeError instead of BadHeaders? I would imagine that no headers are 'bad headers')

Failure: twisted.web._newclient.ResponseFailed: [<twisted.python.failure.Failure <type 'exceptions.AttributeError'>>]

My not-thoroughly-tested workaround is to have delimiter = '\n' in HTTPParser,
and this in lineReceived:

if line and len(line) > 0 and line[-1] == '\r': line = line[:-1]

comment:24 Changed 6 years ago by jknight

Are there any servers not written by paul graham that exhibit this incredible brokenness?

comment:25 Changed 6 years ago by ivank

I don't know. It could take a while to find one in the wild.

Also, I can't find a browser that doesn't allow the broken behavior, even though it violates the RFC: "This flexibility regarding line breaks applies only to text media in the entity-body; a bare CR or LF MUST NOT be substituted for CRLF within any of the HTTP control structures (such as header fields and multipart boundaries)."

Changed 6 years ago by ivank

support for bare LF newlines sent by broken servers

comment:26 Changed 6 years ago by ivank

If Twisted chooses to support LF newlines, I'd be happy to manually rewrite the tests however they should be written.

comment:27 Changed 6 years ago by exarkun

  • Cc ivank added

Any HTTP server that emits LF newlines is broken and written by someone who is either incompetent or malicious. Such people should be punished.

However, if someone (ie, ivank) wants to do the work to support this case, then that's fine with me. I do propose separating that out into a separate unit of work, though. ivank, want to file a ticket for implementing that once this code all makes it to trunk (which I am going to renew my efforts to achieve now)?

comment:28 Changed 6 years ago by ivank

Okay, #3833 is for LF newline support.

comment:29 Changed 5 years ago by exarkun

I undid r26058, a fix for an earlier review comment, because pyflakes now understands __all__, so these workarounds are no longer necessary.

comment:30 Changed 5 years ago by truekonrads

  • Cc truekonrads added

I think that a high level API should be part of this ticket. I think it is best to have a stateful client (one that tracks cookies), like HTTPClientFactory tries to do.

I typically will use such object as a page scraper or to interact with Web APIs. Therefore, the following would be useful:

  • Proxy support. Either by specifying proxy URL or by passing some sort of wrapper object. Proxy support should enable use of CONNECT method.
  • Cookie handling in a cookie-jar (addCookie/getCookie/deleteCookie methods) so that repeat requests would contain cookies.
  • A convenience function for posting application/x-www-form-urlencoded forms.
  • Some sort of fairy dust for JSON (very common use-case) - POST with JSON encoded data in message body.
  • Some convenience to enable LoggingTransport for debugging.
  • Good docs so dummies like me can start using it fully immediately.

comment:31 Changed 5 years ago by exarkun

The high-level API is not going to happen as part of this ticket. It will have documentation, and it will be such that the first three features you described will not have to be implemented in Twisted or by copying a bunch of code from Twisted and editing it. There probably will be proxy and cookie support in Twisted, though.

If you'd like to contribute other requirements or suggestions to the high-level API, please edit TwistedWebClient.

comment:32 Changed 5 years ago by exarkun

  • Author changed from exarkun, itamar to itamar, exarkun
  • Branch changed from branches/expressive-http-client-886-3 to branches/expressive-http-client-886-4

(In [27272]) Branching to 'expressive-http-client-886-4'

comment:33 Changed 5 years ago by exarkun

  • Resolution set to fixed
  • Status changed from new to closed

(In [27413]) Merge expressive-http-client-886-4

Author: exarkun, itamar
Reviewer: glyph
Fixes: #886

Add a new HTTP 1.1 protocol implementation.

comment:34 Changed 4 years ago by <automation>

  • Owner exarkun deleted
Note: See TracTickets for help on using tickets.