Opened 5 years ago

Closed 5 years ago

#3987 enhancement closed fixed (fixed)

Add a high-level HTTP client API based on the HTTP client implemented for #886

Reported by: exarkun Owned by:
Priority: normal Milestone:
Component: web Keywords: httpclient
Cc: thijs, ivank, moshez Branch: branches/high-level-web-client-3987
(diff, github, buildbot, log)
Author: exarkun Launchpad Bug:

Description

#886 adds a new HTTP client implementation. The API provided by that code is very low-level though. It is essentially a protocol with a request method. Connection setup, URL interpretation, redirects, caching, cookies, proxies, authentication, etc are all left up to someone else. There should be a higher-level API which allows for all of these things. The actual implementation of each of these higher-level features may be left up to some other ticket, and other similar features may be left out of Twisted entirely - extension by third parties should be easy and straightforward with this new API.

Here is my proposed API. There will be a single primary method, request. It will accept 3 or 4 arguments: the request method, URI, headers, and body. The method and URI will be strings (later we may want to use URLPath or something for the URI, but I don't think it's ready now); the headers will be an instance of twisted.web.http_headers.Headers; and the body will be an optional IEntityBodyProducer provider. The return value will be like HTTP11ClientProtocol.request.

This ticket will be resolved by the simplest possible implementation of this API: a class (with a parameterized reactor) which sets up a new connection for each request and mostly just passes through to HTTP11ClientProtocol (but it will insert required headers).

More advanced features will generally be implemented by layering other things on top of this. For example, there should eventually be classes such as CachingClient, ProxyEnabledClient, and CookieEnabledClient which allow an application to construct an object like this:

client = CachingClient(CookieEnabledClient(ProxyEnabledClient(BasicClient(reactor))))

And then use it just as it would have used BasicClient:

responseDeferred = client.request('GET', 'http://example.com/foo', Headers({'Accept': ['en']}))

Change History (30)

comment:1 Changed 5 years ago by exarkun

  • Author set to exarkun
  • Branch set to branches/high-level-web-client-3987

(In [27269]) Branching to 'high-level-web-client-3987'

comment:2 Changed 5 years ago by exarkun

  • Keywords review added
  • Owner exarkun deleted

Please review. Note that high-level-web-client-3987 is a branch of expressive-http-client-886-4, not of trunk. Since this isn't the usual workflow, here's what I hope for the future of this ticket.

  1. A reviewer eventually agrees that the branch mentioned here contains good code.
    • This may require that they learn some things about the #886 branch it is based on, as that code is not yet in trunk.
    • This may also require further refinements of the code in the #886 branch. Further review commentary on that ticket is welcome. However, that branch has been reviewed and I think is basically ready for trunk. That ticket has details on why the branch has not already been merged.
  2. The #886 branch is merged into trunk.
  3. This branch is merged into trunk.
  4. (Addendum) The #3811 branch is reviewed and merged into trunk.

comment:3 Changed 5 years ago by truekonrads

  • Author changed from exarkun to exarkun,truekonrads

Perhaps a complete handling of 3XX series responses according to RFC and to real world implementation should be there as well - not just afterFoundGet. For responses where according to standard, user must make a choice (not UA), offer a callback function?

This is of low priority though

comment:4 Changed 5 years ago by exarkun

  • Author changed from exarkun,truekonrads to exarkun

afterFoundGet has nothing to do with this ticket or branch. Did you mean to comment on another ticket?

comment:5 Changed 5 years ago by thijs

  • Cc thijs added
  • Keywords review removed
  • Owner set to exarkun
  1. t.w.c.Agent needs an @since
  2. httpclient.py has an outdated copyright header and incorrect shebang line

comment:6 Changed 5 years ago by exarkun

(In [27295]) Add @since to Agent and update copyright and #! on httpclient.py

refs #3987

comment:7 Changed 5 years ago by exarkun

  • Keywords review added
  • Owner exarkun deleted

Thanks!

comment:8 Changed 5 years ago by ivank

  • Cc ivank added
  • Keywords review removed
  1. I think that a 'Host' header provided in headers to Agent.request should take precedence over a computed Host header. Consider this reasonable use case: making a request to http://some.ip.address/ with your own Host header. Or the less-reasonable: making a request to http://host.name/ with your own Host header.
  1. Is it really a good idea to mutate the user's headers object? If so, shouldn't this mutation be documented?
  1. In test_webclient.AgentTests.test_request, the reason for passing in object() to body should be commented, because it's very different from what the docstring says it should be: an IEntityBodyProducer or None.
  1. client.Agent.request says it returns "A L{Deferred} which fires with the result of the request". test_webclient.AgentTests.test_request says client.Agent.request returns "a L{Deferred} which fires with a L{Response} from the server." The former is a bit vague (maybe on purpose?). In the latter, it's unclear what L{Response} is.

comment:9 Changed 5 years ago by ivank

  • Owner set to exarkun

comment:10 Changed 5 years ago by ivank

Sorry, you could ignore "In the latter, it's unclear what L{Response} is". I just realized it's only in _newclient.

comment:11 Changed 5 years ago by ivank

  1. epydoc doesn't like two @type bodyProducers in twisted.web.client.Agent.request:
| File [...]Projects/Twisted/twisted/web/client.py, line 564, in twisted.web.client.Agent.request
|   Warning: Redefinition of type for bodyProducer

comment:12 Changed 5 years ago by ivank

pydoctor doesn't like it either, all it shows for bodyProducer is "(type: IEntityBodyProducer provider )"

comment:13 Changed 5 years ago by exarkun

(In [27296]) Allow the host header to be overridden by the caller of Agent.request

refs #3987

comment:14 Changed 5 years ago by exarkun

(In [27297]) Avoid modifying the request headers object passed to Agent.request

refs #3987

comment:15 Changed 5 years ago by exarkun

(In [27298]) Add a comment about the non-producer body used in the tests, and actually add the assertion I meant to include about it

refs #3987

comment:16 Changed 5 years ago by exarkun

(In [27299]) Improve the return info in the Agent.request docstring a bit

refs #3987

comment:17 Changed 5 years ago by exarkun

(In [27300]) Fix some type/param mixups in the Agent.request docstring

refs #3987

comment:18 Changed 5 years ago by exarkun

  • Keywords review added
  • Owner exarkun deleted

Thanks for the review.

  1. Fixed in r27296
  2. Fixed in r27297
  3. Fixed in r27298
  4. Fixed in r27299
  5. Fixed in r27300

comment:19 Changed 5 years ago by lvh

Hi!

I'm mostly dangerously imcompetent with regards to reviewing new t.w.c content, so I'm not going to remove the review tag. I had a skim of the docstrings and comments to check for spelling errors and they looked okay (hopefully this saves some other reviewer time).

The only thing I noticed was that perhaps some of this fancy new API (of which this is a part -- see the big trac wiki page here: http://twistedmatrix.com/trac/wiki/TwistedWebClient) should have some form of lore docs. Perhaps this point _is_ valid, but it should be fixed in a separate ticket.

comment:20 Changed 5 years ago by exarkun

  • Keywords review removed
  • Owner set to exarkun
  • Status changed from new to assigned

I started to write some documentation and found at least one thing that I'm not happy with in the code, so I'm going to correct that and then re-submit for review.

comment:21 Changed 5 years ago by exarkun

  • Keywords review added
  • Owner exarkun deleted
  • Status changed from assigned to new

Alright, documentation added. Please review!

comment:22 follow-up: Changed 5 years ago by glyph

  • Keywords review removed
  • Owner set to exarkun
  1. There are a bunch of issues with the docs, but please feel free to defer these (except the first point), or delegate them to someone else. I'm really happy to see that there are howto docs at all for this.
    1. The one thing that can't be deferred: these docs refer to private names. Don't refer to private names in the extra-public, "here's how you're supposed to do it", not just how you're allowed to do it, documentation.
    2. The listings should really be separate python files, so that I can run them individually without copy/pasting them from the doc itself. This will hopefully facilitate testing in the future.
    3. The prerequisites should be pulled out of the "overview" and put into their own section, before the overview. Also, they should be hyperlinks.
    4. There's an error in the final example in the way it sends headers (see below for more detail).
    5. Given that you're already binding the Deferred to a name, I think callbacks should be defined after the Agent call rather than before. It's good to get to the business of actually doing whatever we're supposed to be doing as early as possible in the example, so that the reader won't get lost in a forest of mostly-irrelevant callbacks.
    6. The documentation is extremely dry, formal, and abstract. It seems to assume a deep understanding of HTTP, what it means to make a request, and how requests stream. I know we've marked some prerequisites, but I think that ultimately people aren't going to build up a formal understanding of this stuff when they just want to scrape some RSS out of a bucket and dump it on a web page. Generally speaking, I think it's a good idea for any tutorial documentation to have a sample problem. "Let's say you want to read the front page of a web site and check its modification date." Build on that and make a narrative. "Now you want to do X, so we'll modify the simple example to do that". Keep in mind that anyone reading this is going to want to solve a specific problem, and is going to be looking for something that looks like the solution to their problem.
  2. Agent.request's docstring says it returns a Deferred that fires with a "L{Response} ... but Response isn't in that namespace. In fact, Response isn't even public! It needs to be imported from _newclient and put into __all__. (It probably needs a note on it, too, saying that it's public so that future maintainers won't accidentally add public attributes to it.)
  3. I have a number of convenience issues with the signature of the API. Some require changing argument order, but some could be deferred for a series of future make-it-more-convenient tickets.
    1. Everything except the URI should be optional. I feel that this is really, really important. It should be no harder to use Agent than to use getPage for simple cases.
      1. "method" should be optional, and default to 'GET'. In fact, I'm inclined to say that in the high-level API, there should (at least eventually) be different APIs for known methods like GET, HEAD, and POST. agent.request(some_imported_thing.GET, ...) is tedious, but agent.get(...) is even shorter than 'request' and says the same thing, and is just as hard to typo the method. Not to mention the fact that GET shouldn't have a body, HEAD can't have a body, and POST's body needs to be formatted in a particular way. Not to mention the different ways that they have to respond to redirects.
      2. "headers" should be optional, defaulting to 'no additional headers'
      3. bodyProducer should be optional, defaulting to None.
    2. The Headers and IEntityBodyProducer arguments should be easier to pass in. For a decidedly high-level API like this, I feel like some adaptation wouldn't be out of place. However, this can all be deferred to a separate ticket, since Headers and IEntityBodyProducer will still be valid arguments.
      1. Requiring the import of Headers just so I can pass a dictionary to Agent.request, even if I have no headers at all to pass, kinda sucks. I mean, technically it seems like the right thing, but it's a whole bunch of extra typing for no benefit that I can see from the perspective of just using the API.
        1. Not to mention the fact that I am rarely going to want to set all of the headers; the user-agent really ought to be set for me by the API, for example, and the content-length is set (sort of) by the API already. My ideal header-manipulation API would mostly be state on the Agent object itself. If I want to change what encodings I can accept, I want to tell the Agent before I make the request, and I want it to actually know about the encodings. Cookie: headers practically need to be done this way. And I'd really like a higher-level interface to sending an encoded form-post than manually encoding it and sending the header myself. So we have to assume that the Agent is itself going to be manipulating the headers, or is going to have an existing bag of headers that it starts with. The "set some headers just for this request" API is something I hope I will have to use very rarely. Okay, I guess this point isn't really review feedback, but maybe it's helpful to see my thought process here.
      2. Similarly, the process for writing an entity body producer is really tedious. I want to be able to just pass a string, or a file-like object, and have something figure it out.
        1. It would be nice to have a default entity-body thingy to inherit from, because the def xxx(self): pass declarations in the example documentation look tedious and distracting.
    3. It's very easy to make mistakes with the headers, given that the API forces you to pass in a dict of lists of strings, but doesn't do any error checking on that. This is borne out by the final example in client.xhtml, which sends as its headers:
      ['GET / HTTP/1.1\r\n',
       'Connection: close\r\n',
       'Host: example.com\r\n',
       'User-Agent: T\r\n',
       'User-Agent: w\r\n',
       'User-Agent: i\r\n',
       'User-Agent: s\r\n',
       ...]
      
      (I think you can see where that is going.) There's also no way to notice you've set a zillion headers.
  4. Also on a 'convenience' note, it looks like a lot of work to retrieve the request body as a string (especially as compared to getPage). There should be a ticket to make this very, very easy. Similarly there should be built-in facilities to download the result to a file.
  5. There needs to be a ticket for HTTPS support. That should follow very quickly; if we release 9.0 (or at this point, more likely "Twisted X") with HTTPS support in getPage and not in this, I suspect it will remain generally unused until the next release.
  6. In general, error-handling should be able to be accomplished through callbacks or errbacks, not both. I think that the ValueError should be raised asynchronously.
  7. The exception should really be SchemeNotSupported, not ValueError. Built-in exceptions are a magnet for ambiguity.
  8. Shouldn't passing a conflicting value for the Host: header be an error, rather than silently passed? You're not really retrieving the same URI any more, if you set that.
  9. All of the exception types in _newclient need to be made public, so that it can be more specific about what kinds of errors the Deferred from request might fire with.
  10. There's no support for timeouts, but I guess that's #3811's job...?
  11. This API seems pretty test-unfriendly. Passing in a reactor is better than nothing, but the real magic here happens in Request; it would be hard to interpose something to inspect the headers ultimately being sent or the request-body content or the response. Trying to imagine how I would test code using this, I find myself thinking I would stub out Agent completely. Let's have a separate ticket for a verified fake.
    1. That said, the test coverage for this branch itself looks pretty nice, except for the self.reactor.advance() stuff. I would recommend putting that into a method, self.completeConnection(), so that you can put the 'whitebox :/' thing into a docstring once rather than repeating the comment over and over.
  12. "redirects, caching, cookies, proxies, authentication" are all mentioned by the description of this ticket; it doesn't seem like any of them are actually handled here. There should be tickets for each of them.
    1. At first, I assumed that the redirect (3xx) support was in _newclient. Then I realized that the connectTCP was actually happening here, instead. I think that redirection is pretty basic and probably shouldn't be split into a separate ticket; it affects what you expect as a result, it's not an extension. For authentication, you'll need to say something different to the API, but for redirects, the API will hopefully be the same.

More general issues:

  1. It was a real pain in the ass to review this branch. It will be even more of a pain to do #3811, worse yet if one of these lands in the interim. I guess it points to a failure of svn (is there a way I could have done this review with bzr against our current repository?) but it also seems backwards to have started with the entire thing in _newclient when we still weren't sure what API it was being built to support. There's also the fact that once this is OK'd, it still has to be reviewed again if any changes are made to #886 which require this to change, possibly as a part of the larger #886. Personally I'd rather just review a big branch than contend with the branch-of-a-branch mess. I realize we may disagree here :) but there might be other ways to mitigate branch size, such as informal, partial reviews, or reviews by teams.
  2. There's a general dependency-injection problem that I feel is creeping up on us as we get better about abstraction and component separation. Agent(reactor) looks nice and non-global, but there are a lot of potential features it neglects. For example, on most platforms these days have an 'HTTP proxy' setting, and many environments require it to be set for HTTP requests to work. If we want to make Twisted honor that setting, where can we put it? There needs to be a global Agent somewhere that can do that; making Agent itself use a proxy would negate the benefit of having a class rather than a bunch of free-floating functions. I don't think that we're going to solve this problem in this branch, and there are other problems with it; it's entirely possible that your application would want to use different cookie storage, but the same HTTP proxy, so maybe this is the wrong layer to be worried about it anyway. Put more simply, library configuration is a powerful tool, especially for users who want to tweak an existing application without writing code for it; reducing the usage of stateful global objects, while otherwise good, is opposed to that particular goal. It would be nice to have a way to satisfy a dependency at test-time or in certain cases with a specific object, but have a global default that could be shared for more synergistic use-cases... either for sharing with the user or for allowing different applications to communicate as peers using fixed points of reference rather than one application having to be "on top" and create all the objects for the other. (For example: one tool does file-download requests; another tool does cookie-based logins for you by pulling cookies from firefox, so the downloading tool doesn't even need to know that you need to be logged in.)

comment:23 in reply to: ↑ 22 Changed 5 years ago by exarkun

Replying to glyph:

>   1. There are a bunch of issues with the docs, but please feel free to defer these (except the first point), or delegate them to someone else.  I'm really happy to see that there are howto docs at all for this.
>     1. The one thing that can't be deferred: these docs refer to private names.  Don't refer to private names in the extra-public, "here's how you're ''supposed'' to do it", not just how you're allowed to do it, documentation.

Tricky tricky Python. I left Response in _newclient and did not expose it in client on purpose. The public API involves using the public attributes of Response, but it does not involve instantiating one or subclassing it.

>     1. The listings should really be separate python files, so that I can run them individually without copy/pasting them from the doc itself.  This will hopefully facilitate testing in the future.

Done in r27329.

>     1. The prerequisites should be pulled out of the "overview" and put into their own section, before the overview.  Also, they should be hyperlinks.

I made a separate prerequisites section in r27331. I would like to add hyperlinks, but we cannot currently link between subprojects (#3438).

>     1. There's an error in the final example in the way it sends headers (see below for more detail).

Fixed in r27330.

>     1. Given that you're already binding the Deferred to a name, I think callbacks should be defined after the `Agent` call rather than before.  It's good to get to the business of actually doing whatever we're supposed to be doing as early as possible in the example, so that the reader won't get lost in a forest of mostly-irrelevant callbacks.

I converted the examples to the style I think you're describing in r27332. Generally I think I prefer this as well (some of the docs were written this way already, even), but sometimes I wonder if mixing code and function definitions throws some people off.

>     1. The documentation is extremely dry, formal, and abstract.  It seems to assume a deep understanding of HTTP, what it means to make a request, and how requests stream.  I know we've marked some prerequisites, but I think that ultimately people aren't going to build up a formal understanding of this stuff when they just want to scrape some RSS out of a bucket and dump it on a web page.  Generally speaking, I think it's a good idea for any tutorial documentation to have a sample problem.  "Let's say you want to read the front page of a web site and check its modification date."  Build on that and make a narrative.  "Now you want to do X, so we'll modify the simple example to do that".  Keep in mind that anyone reading this is going to want to solve a specific problem, and is going to be looking for something that looks like the solution to their problem.

Hmm. Indeed. I think I will defer this point (because maybe that means someone else will do it ;). Addressing this would involve a significant rewrite of the prose in the document, I think, and I'm not feeling motivated enough to accomplish that any time soon.

>   1. `Agent.request`'s docstring says it returns a `Deferred` that fires with a "`L{Response}` ... but `Response` isn't in that namespace.  In fact, `Response` isn't even public!  It needs to be imported from `_newclient` and put into `__all__`.  (It probably needs a note on it, too, saying that it's public so that future maintainers won't accidentally add public attributes to it.)

What do you think about this, given what I said about Response above?

>   1. I have a number of convenience issues with the signature of the API.  Some require changing argument order, but some could be deferred for a series of future make-it-more-convenient tickets.
>     1. Everything except the URI should be optional.  I feel that this is really, really important.  It should be no harder to use `Agent` than to use `getPage` for simple cases.
>       1. "method" should be optional, and default to 'GET'.  In fact, I'm inclined to say that in the high-level API, there should (at least eventually) be different APIs for known methods like GET, HEAD, and POST.  agent.request(some_imported_thing.GET, ...) is tedious, but agent.get(...) is even shorter than 'request' and says the same thing, and is just as hard to typo the method.  Not to mention the fact that GET shouldn't have a body, HEAD ''can't'' have a body, and POST's body needs to be formatted in a particular way.  Not to mention the different ways that they have to respond to redirects.
>       1. "headers" should be optional, defaulting to 'no additional headers'
>       1. bodyProducer should be optional, defaulting to `None`.
>     1. The `Headers` and `IEntityBodyProducer` arguments should be easier to pass in.  For a decidedly high-level API like this, I feel like some adaptation wouldn't be out of place.  However, this can all be deferred to a separate ticket, since `Headers` and `IEntityBodyProducer` will still be valid arguments.
>       1. Requiring the import of `Headers` just so I can pass a dictionary to `Agent.request`, even if I have no headers at all to pass, kinda sucks.  I mean, technically it seems like the right thing, but it's a whole bunch of extra typing for no benefit that I can see from the perspective of just using the API.
>         1. Not to mention the fact that I am rarely going to want to set ''all'' of the headers; the user-agent really ought to be set for me by the API, for example, and the content-length is set (sort of) by the API already.  My ideal header-manipulation API would mostly be state on the `Agent` object itself.  If I want to change what encodings I can accept, I want to tell the Agent before I make the request, and I want it to actually know about the encodings.  `Cookie:` headers practically ''need'' to be done this way.  And I'd really like a higher-level interface to sending an encoded form-post than manually encoding it and sending the header myself.  So we have to assume that the Agent is itself going to be manipulating the headers, or is going to have an existing bag of headers that it starts with.  The "set some headers just for this request" API is something I hope I will have to use very rarely.  Okay, I guess this point isn't really review feedback, but maybe it's helpful to see my thought process here.

This is probably illustrative of a shortcoming in our development process: plans for the future. Or perhaps I should just be better about filing tickets for future work that's already planned (or at least intended, planned might be giving myself too much credit). Also, perhaps I should have called this the "medium-level" API. Anyway. I'll try to lay out some of those plans here, and if they make sense, I'll file tickets for them.

The example usage I gave in the ticket description may be allowed, but I certainly agree with the ease-of-use requirements you outlined here for an actual "high-level" API. Since this is purely implementable in terms of the current Agent.request method, though, and since there are going to be several different classes like Agent, it seems like those high-level APIs belong elsewhere than on this class. They could be done as a mixin, but I feel like composition would be better. I'm not sure what that wrapper class would be called yet, but does Something(agentLike).get(uri) seem reasonable? Maybe we should fully spec out this super-high-level-no-seriously-use-this-one API before calling this ticket done (and I'm happy to implement at least part of it in this branch so that there is immediately a getPage-level replacement when this is merged. The ticket does say "high-level" after all :).

So with that in mind, I'd prefer to make at most these changes to the signature of Agent.request: headers and bodyProducer become optional.

>       1. Similarly, the process for writing an entity body producer is really tedious.  I want to be able to just pass a string, or a file-like object, and have something figure it out.

I filed #4017.

>         1. It would be nice to have a default entity-body thingy to inherit from, because the `def xxx(self): pass` declarations in the example documentation look tedious and distracting.

Hmm. I guess so. How about class StringProducer(Noop(IEntityBodyProducer)):? :)

>     1. It's very easy to make mistakes with the headers, given that the API forces you to pass in a dict of ''lists'' of strings, but doesn't do any error checking on that.  This is borne out by the final example in client.xhtml, which sends as its headers:
> {{{
> ['GET / HTTP/1.1\r\n',
>  'Connection: close\r\n',
>  'Host: example.com\r\n',
>  'User-Agent: T\r\n',
>  'User-Agent: w\r\n',
>  'User-Agent: i\r\n',
>  'User-Agent: s\r\n',
>  ...]
> }}}
>        (I think you can see where that is going.)  There's also no way to ''notice'' you've set a zillion headers.

Yea, that sucks. Headers should probably enforce something about the data given to it. Filed #4022.

>   1. Also on a 'convenience' note, it looks like a lot of work to retrieve the request body as a string (especially as compared to `getPage`).  There should be a ticket to make this very, very easy.  Similarly there should be built-in facilities to download the result to a file.

Yep. If we agree on the super high level API I mentioned above, I'll file a ticket for it.

>   1. There needs to be a ticket for HTTPS support.  That should follow very quickly; if we release 9.0 (or at this point, more likely "Twisted X") with HTTPS support in getPage and not in this, I suspect it will remain generally unused until the next release.

Filed #4023. I skipped it in this branch just because the proper certificate verification logic is complicated and worth an independent review.

>   1. In general, error-handling should be able to be accomplished through callbacks or errbacks, not both.  I think that the `ValueError` should be raised asynchronously.
>   1. The exception should really be `SchemeNotSupported`, not `ValueError`.  Built-in exceptions are a magnet for ambiguity.

Done in r27333.

>   1. Shouldn't passing a conflicting value for the `Host:` header be an error, rather than silently passed?  You're not really retrieving the same URI any more, if you set that.

I think it's a bit tricky to figure out when it's conflicting. You might have a URI with an IP address, for example. I don't feel too strongly either way, but a previous reviewer requested this.

>   1. All of the exception types in `_newclient` need to be made public, so that it can be more specific about what kinds of errors the `Deferred` from `request` might fire with.

I suppose. I left them private because they're all supposed to be treated the same, but that's probably wrong anyway. Otherwise they'd all be the same exception. Maybe I should just move the exceptions out of _newclient and into twisted.web.error, though?

>   1. There's no support for timeouts, but I guess that's #3811's job...?

Timeouts, yar. Cancellable Deferreds, right? #3811's job is only to make it possible to cancel a request at the protocol/transport level. It doesn't do anything for anyone not using the protocol directly. I'm not sure what to do here. I'd definitely prefer not to add three to seven timeout parameters to Agent.request.

>   1. This API seems pretty test-unfriendly.  Passing in a reactor is better than nothing, but the real magic here happens in `Request`; it would be hard to interpose something to inspect the headers ultimately being sent or the request-body content or the response.  Trying to imagine how I would test code using this, I find myself thinking I would stub out `Agent` completely.  Let's have a separate ticket for a verified fake.

Yea, a verified fake of Agent is probably how I'd recommend people unit test Agent-using code. Filed #4024.

>     1. That said, the test coverage for this branch ''itself'' looks pretty nice, except for the `self.reactor.advance()` stuff.  I would recommend putting that into a method, `self.completeConnection()`, so that you can put the 'whitebox :/' thing into a docstring once rather than repeating the comment over and over.

Done in r27334.

>   1. "redirects, caching, cookies, proxies, authentication" are all mentioned by the description of this ticket; it doesn't seem like any of them are actually handled here.  There should be tickets for each of them.

Okay, I'll file those before I resolve this.

>     1. At first, I assumed that the redirect (3xx) support was in `_newclient`.  Then I realized that the `connectTCP` was actually happening here, instead.  I think that redirection is pretty basic and probably ''shouldn't'' be split into a separate ticket; it affects what you expect as a result, it's not an extension.  For authentication, you'll need to say something different to the API, but for redirects, the API will hopefully be the same.

Hm. To some extent, redirect handling does affect things. But on the other hand, there's ever the possibility of a loop in a redirect such that it can't be resolved, or just too many redirects. These may not show up as the same kind of error, but they still need to be handled. So I was thinking it would be fine as a separate ticket, and even a separate class you can wrap around Agent, but which presents the same API. I was also expecting the auth version to also basically have the same API. Pass a keychain to its initializer. Use request as usual. Anything else that needs to happen is internal. Don't want to disrupt the application level code issuing requests with these concerns.

More general issues:

>   1. It was a real pain in the ass to review this branch.  It will be even more of a pain to do #3811, worse yet if one of these lands in the interim.  I guess it points to a failure of svn (is there a way I could have done this review with bzr against our current repository?) but it also seems backwards to have started with the entire thing in `_newclient` when we still weren't sure what API it was being built to support.  There's also the fact that once this is OK'd, it still has to be reviewed again if any changes are made to #886 which require this to change, possibly as a part of the larger #886.  Personally I'd rather just review a big branch than contend with the branch-of-a-branch mess.  I realize we may disagree here :) but there might be other ways to mitigate branch size, such as informal, partial reviews, or reviews by teams.

Yea. I tried to do this in bzr. It failed. If it's actually possible, I'm happy to use bzr, but I think I need someone to lay out the steps.

>   1. There's a general dependency-injection problem that I feel is creeping up on us as we get better about abstraction and component separation.  `Agent(reactor)` looks nice and non-global, but there are a lot of potential features it neglects.  For example, on most platforms these days have an 'HTTP proxy' setting, and many environments require it to be set for HTTP requests to work.  If we want to make Twisted honor that setting, where can we put it?  There needs to be a global Agent somewhere that can do that; making `Agent` itself use a proxy would negate the benefit of having a class rather than a bunch of free-floating functions.  I don't think that we're going to solve this problem in this branch, and there are other problems with it; it's entirely possible that your application would want to use different cookie storage, but the same HTTP proxy, so maybe this is the wrong layer to be worried about it anyway.  Put more simply, library configuration is a powerful tool, especially for users who want to tweak an existing application without writing code for it; reducing the usage of stateful global objects, while otherwise good, is opposed to that particular goal.  It would be nice to have a way to satisfy a dependency at test-time or in certain cases with a specific object, but have a global default that could be shared for more synergistic use-cases... either for sharing with the user or for allowing different applications to communicate as peers using fixed points of reference rather than one application having to be "on top" and create all the objects for the other.  (For example: one tool does file-download requests; another tool does cookie-based logins for you by pulling cookies from firefox, so the downloading tool doesn't even need to know that you need to be logged in.)

If Twisted is making HTTP requests, it should be accepting an agent as an argument and making those requests via that object. So then if you want to configure some weird http scenario, you do so and give Twisted the resulting agent. I don't think this will come up too often, we don't make http requests in many places (but certainly a few). Trickier is encouraging a style where people don't ever instantiate their own agent in their library code, but only somewhere near the top of their app and then pass it all the way through everything (everyone hates passing arguments to functions!). I'll think more about this, though.

comment:24 Changed 5 years ago by exarkun

  • Keywords review added
  • Owner changed from exarkun to glyph

Re-assigning for further consideration.

comment:25 follow-up: Changed 5 years ago by moshez

  • Keywords review removed
  • In test_connection_failed, please switch the order of .assertFailure and doing the clientConnectionFailed/completeConnection. This makes it easier to eyebally verify the test makes sense.
  • In the AgentTests, please group all "infrastructure" methods (non-test methods) together in the beginning. _verifyConnection is somewhere at the end, and all the others are in the beginning.
  • A method called _verifyConnection should not do stuff. It sounds like it just makes sure the connection is correct, where it actually does the successful connection thing.
  • Several places throughout the code -- the API *promises* not to add any headers except Host. Why is that? [I didn't see a good answer in the comments so far, but if there's a good reason, there's no need to change the code -- except perhaps to make the reason explicit.]
  • is Headers(dict(x.getAllRawHeaders()) an isomorphism? Even in the case of double headers?

comment:26 in reply to: ↑ 25 Changed 5 years ago by exarkun

  • Cc moshez added
  • Keywords review added
  • Owner glyph deleted

Replying to moshez:

  • In test_connection_failed, please switch the order of .assertFailure and doing the clientConnectionFailed/completeConnection. This makes it easier to eyebally verify the test makes sense.

There is no test_connection_failed. I think you mean AgentTests.test_connectionFailed in twisted/web/test/test_webclient.py (please be specific in your reviews! :) I made a change in r27403.

  • In the AgentTests, please group all "infrastructure" methods (non-test methods) together in the beginning. _verifyConnection is somewhere at the end, and all the others are in the beginning.

Done in r27405.

  • A method called _verifyConnection should not do stuff. It sounds like it just makes sure the connection is correct, where it actually does the successful connection thing.

Changed the method name to _verifyAndCompleteConnectionTo (from _verifyConnectionTo) in r27406.

  • Several places throughout the code -- the API *promises* not to add any headers except Host. Why is that? [I didn't see a good answer in the comments so far, but if there's a good reason, there's no need to change the code -- except perhaps to make the reason explicit.]

The word promise does not appear anywhere in the source that I can recall or find. Can you be more precise about your question?

  • is Headers(dict(x.getAllRawHeaders()) an isomorphism? Even in the case of double headers?

I guess so:

>>> h = Headers({'foo': ['bar', 'baz'], 'quux': ['foobar']})
>>> h == Headers(dict(h.getAllRawHeaders()))
True

Thank you for the review.

comment:27 Changed 5 years ago by radix

  • Keywords review removed
  • Owner set to exarkun

I hate the word "entity", can you please get rid of it? IBodyProducer seems much more reasonable.

+1

comment:28 Changed 5 years ago by exarkun

I responded the rest of glyph's earlier comments which radix thought were important:

  • In r27409 I added links to other Twisted documentation to the rereqs section of the client howto doc
  • In r27410 I made Response public so that the docs don't have to talk about private things, though I also added a note to its docstring saying that it is not intended to be instantiated or subclassed.
  • In r24711 I made the headers and bodyProducer arguments to Agent.request optional

Additionally, in r27412 I renamed IEntityBodyProducer. Even though this comment comes after radix's, I made the above changes in response to his review feedback beforehand.

comment:29 Changed 5 years ago by exarkun

  • Resolution set to fixed
  • Status changed from new to closed

(In [27414]) Merge high-level-web-client-3987

Author: exarkun
Reviewer: glyph, thijs, ivank, lvh, moshez, radix
Fixes: #3987

Add a mid-level HTTP 1.1 client API to twisted.web.client based on the new HTTP
1.1 protocol implementation. Also add an HTTP client howto documenting the usage
of this API.

comment:30 Changed 3 years ago by <automation>

  • Owner exarkun deleted
Note: See TracTickets for help on using tickets.