Opened 8 years ago

Closed 7 years ago

#6634 enhancement closed fixed (fixed)

Add a hook applications can use to determine the connection destination given the request URL

Reported by: habnabit Owned by: Itamar Turner-Trauring
Priority: normal Milestone:
Component: web Keywords:
Cc: jknight Branch: branches/agent-endpointFactory-6634-2
branch-diff, diff-cov, branch-cov, buildbot
Author: habnabit

Description

Currently, there's a _getEndpoint method, but it's a private implementation detail. This could be turned into a parameter to Agent.__init__ so that inheriting from Agent isn't necessary to specify an endpoint factory.

Change History (20)

comment:1 Changed 8 years ago by DefaultCC Plugin

Cc: jknight added

comment:2 Changed 8 years ago by Jean-Paul Calderone

"Callable of X, Y, Z" is an awkward, inconvenient, and non-extensible interface. Please avoid introducing new APIs that work this way.

Also please describe why this might be useful. After all, Agent already knows how to create an endpoint for both HTTP and HTTPS, both of the kinds of things.

comment:3 Changed 8 years ago by habnabit

Author: habnabit
Branch: branches/agent-endpointFactory-6634

(In [39263]) Branching to 'agent-endpointFactory-6634'

comment:4 Changed 8 years ago by habnabit

(In [39264]) Add endpointFactory to twisted.web.client.Agent.

Refs #6634.

comment:5 Changed 7 years ago by habnabit

Branch: branches/agent-endpointFactory-6634branches/agent-endpointFactory-6634-2

(In [42600]) Branching to 'agent-endpointFactory-6634-2'

comment:6 Changed 7 years ago by Jean-Paul Calderone

Also please describe why this might be useful. After all, Agent already knows how to create an endpoint for both HTTP and HTTPS, both of the kinds of things.

Additionally, ProxyAgent.__init__ accepts an endpoint.

comment:7 in reply to:  6 Changed 7 years ago by habnabit

Replying to exarkun:

Also please describe why this might be useful. After all, Agent already knows how to create an endpoint for both HTTP and HTTPS, both of the kinds of things.

Additionally, ProxyAgent.__init__ accepts an endpoint.

ProxyAgent only makes HTTP proxy requests. In the case of, say, connecting to a server over SOCKS5, ultimately you want to issue a regular request to the remote end, and not an HTTP proxy request.

Right now there's two distinct use cases that I can think of for this:

  1. Connecting to a server over a proxy mechanism like SOCKS5. Right now there is no way to do this at all with the public API for Agent.
  2. Connecting to a server over a different transport type, like UNIX domain sockets. Docker is one example of software that does HTTP requests over UNIX domain sockets, and it would be useful to be able to communicate with it without having to bind to a TCP port. Again, right now there is no way to do this at all with the public API for Agent.

comment:8 Changed 7 years ago by habnabit

(In [42604]) Reconstruct everything from scratch.

The old branch diverged way too much and wasn't very good in the first place. So, start over.

refs #6634

comment:9 Changed 7 years ago by habnabit

Keywords: review added

comment:10 Changed 7 years ago by Itamar Turner-Trauring

Keywords: review removed
Owner: set to Itamar Turner-Trauring

I see two problems with this design:

Having some arguments in Agent ignored if other options are passed in is a bit of a hack. I therefore propose we add a new class EndpointAgent which takes (in addition to reactor and pool) an endpointConstructor argument.

This could either be something that provides the currently proposed interface, or a simpler variant. I feel like the IAgentEndpointConstructor interface is heavily optimized for one use case (SOCKS) but not the other (Docker unix socket).

Specifically, a simpler version would have the endpoint constructor be a two argument callable that takes a URL and reactor and returns an endpoint. The Unix socket constructor would just be lambda url: UnixEndpoint(mypath, reactor). The SOCKS one would need to be constructed with a relevant TLS argument and do URL parsing ... but I bet we could provide a utility that does that if there's more than one endpoint constructor that needs to implement that particular code.

comment:11 Changed 7 years ago by Itamar Turner-Trauring

Owner: changed from Itamar Turner-Trauring to habnabit

After noting exarkun's comment above:

  1. Move same functionality into new class, EndpointAgent.
  1. Optionally, simpler example of using a Unix socket (to talk to Docker? that'd be a realistic "you can run it at home" example).
  1. Classes implementing an interface should use @implementer(IFoo)... and don't need to copy the whole docstring since it's in the interface.

Resubmit for review and I'll try to do a final pass.

comment:12 Changed 7 years ago by Jean-Paul Calderone

Summary: There is no public way to specify the endpoint used for twisted.web.client.Agent's connectionAdd a hook applications can use to determine the connection destination given the request URL

comment:13 Changed 7 years ago by Itamar Turner-Trauring

Keywords: review added
Owner: changed from habnabit to Jean-Paul Calderone

Started new buildbot run (but it's still running so may be red at end). Otherwise ready for review.

comment:14 Changed 7 years ago by Jean-Paul Calderone

Status: newassigned

comment:15 Changed 7 years ago by Jean-Paul Calderone

Keywords: review removed
Owner: changed from Jean-Paul Calderone to Itamar Turner-Trauring
Status: assignednew

Thanks.

  1. docs/web/howto/listings/client/endpointconstructor.py
    1. Needs a copyright header
    2. This particular example seems like it would be better satisfied by ProxyAgent. Or do I still not understand what ProxyAgent does/is for (wish it had some docs)? Maybe a good thing to include in this branch would be some docs differentiating these two things for dense people like me.
    3. Richard built us some tools for testing examples. I don't think we decided yet that this means we can finally include examples in the "everything must have tests" policy - but it seems like maybe we should? Not a blocker but if you don't do that here at least file a ticket for doing it.
  2. docs/web/howto/client.rst
    1. APIs in docs should be :api:... links - eg Agent and 1. An arbitrary endpoint is how you implement some useful functionality. The useful functionality is controlling what an agent connects to though. I suggest tweaking the heading and language a bit to be more inviting to people who know what they want to do but not yet how they want to do it?

Agent.forEndpointConstructor in the new text.

  1. The news fragment could probably also be tweaked in a similar way.
  2. twisted/web/iweb.py
    1. "Constructor" seems like an unnecessary variation from "Factory". I know everyone hates "Factory" but I don't think the difference between "Constructor" and "Factory" conveys any interesting nuance of meaning here, the new word is just a potential source of confusion.
    2. Similar comment for "constructEndpoint" - we used "build" elsewhere already (buildProtocol) so maybe buildEndpoint is better - or some entirely different style of name, like endpointFor... oops not sure what noun to put at the end there because what is a "scheme, hostname, port" triple? Not a URI... This is more nit-picky and people will probably figure out the current name, so this is optional.
    3. I guess the ideal signature here, though, would be to accept a request object. :( I'm not sure what interesting stuff you could do with the whole request here but I suspect there's something - client-side session binding? Interesting routing choices based on the path of the URI? "/foo/bar" always goes via my tor proxy, "/foo/baz" just gets a regular https connection... sort of thing. But we have no request object, sad. Maybe endpointForURI and pass the unparsed URI (or the urlparse.urlparse parsed form, crummy as it is). Later we can introduce more endpointForSomethingElse methods that get more information and take precedent if they are defined. Or, I dunno, maybe a Request structure and pass it all in. The current signature, scheme, hostname, port seems like it should definitely change somehow, at least.
  3. twisted/web/client.py
    1. _StandardEndpointConstructor.constructEndpoint should have a bit of a docstring, something like "Get an endpoint which connects directly to the host given by the URI." Somewhat implied by "Standard" but being explicit here doesn't hurt (I think! does it? if adding any kind of docstring at all introduces pydoctor or twistedchecker errors then maybe it isn't worth it)
    2. Maybe this is a good opportunity to introduce explicit byte string literal syntax in constructEndpoint. Optional.
    3. Agent.__init__ still accepts (and now ignores) an endpointConstructor parameter. Looks like this should just be deleted.
    4. Regarding the implementation of Agent.forEndpointConstructor - bleugh. I wish there were a better way to do this. This comment is not actionable.
    5. Nit... does "for" make the most sense here? What about with or using? Optional.
    6. For what it's worth, I really like the amount of code remaining in Agent after this change.
  4. twisted/web/test/test_agent.py
    1. FakeEndpointConstructor looks like a cross between a spy and a stub. It also has a bunch of public attributes that aren't part of the interface it implements.
    2. test_endpointConstructor uses native string literals in a couple places where it should use byte string literals.
    3. The docstring for test_endpointConstructor is a little incomplete. The test seems primarily to be about the constructor being used to create endpoints.
    4. test_endpointConstructorDefaultPool would be easier if HTTPConnectionPool implemented comparison for equality.
    5. test_endpointConstructorPool uses assertIdentical, should use assertIs.
    6. It seems that some (existing) tests (that are now) for _StandardEndpointConstructor could be factored into a new test case and could test that class directly. Optional, file a ticket and link to it from someplace in the code if you don't want to work on this now.
  5. Buildbot is mostly green. There are some twistedchecker errors but they might be bogus.
  6. I notice your employer isn't in the LICENSE file yet. If you think it is warranted please add a line there.

I'd like to have another look once the issues with the example are addressed (one way or another). Thanks again.

comment:16 Changed 7 years ago by Itamar Turner-Trauring

ProxyAgent changes the way requests are done, it does "GET http://yourhost.com/foo.html HTTP/1.1" instead of "GET /foo.html HTTP/1.1" if you go to URL "http://yourhost.com/foo.html".

comment:17 Changed 7 years ago by Itamar Turner-Trauring

Perhaps we should make _URI object public, and then it could be passed to the endpoint factory...

comment:18 Changed 7 years ago by Itamar Turner-Trauring

Keywords: review added
Owner: changed from Itamar Turner-Trauring to Jean-Paul Calderone

Implemented all required comments, and some of the optional ones. Filed a couple of follow up tickets as well (#7385, #7386).

comment:19 Changed 7 years ago by Jean-Paul Calderone

Keywords: review removed
Owner: changed from Jean-Paul Calderone to Itamar Turner-Trauring
  1. docs/web/howto/client.rst: that takes a endpointFactory - a -> an
  2. docs/web/howto/listings/client/endpointconstructor.py
    1. The UNIX socket path should use byte literal syntax
    2. A while ago I remember there was some effort to make sure examples all had usage documentation at the top. This isn't an example I guess (it's not even in the examples directory!) but maybe a quick blurb about the argument this expects would be a good thing. See, for example, docs/web/examples/getpage.py for the kind of thing I mean.

That's it. This looks pretty nice now, thanks for your work itamar and habnabit. Please address those minor points, get a greenish buildbot run, and then merge.

comment:20 Changed 7 years ago by itamarst

Resolution: fixed
Status: newclosed

(In [42679]) Merge agent-endpointFactory-6634-2: Allow creating Agent that connect with arbitrary endpoints.

Author: habnabit, itamar Review: itamar, exarkun Fixes: #6634

Agent.usingEndpointFactory allows making HTTP connections with arbitrary endpoints, e.g. with a SOCKS proxy or a Unix socket.

Note: See TracTickets for help on using tickets.