Opened 8 years ago

Closed 3 years ago

#1774 defect closed fixed (fixed)

Support HTTP proxies with the twisted.web.client.Agent API

Reported by: ironfroggy Owned by: therve
Priority: normal Milestone:
Component: web Keywords: httpclient
Cc: dreid, thijs, exarkun Branch: branches/agent-proxy-1774-2
(diff, github, buildbot, log)
Author: therve Launchpad Bug:

Description

I had need of connecting through an HTTP proxy, and thus need of running my own proxy for testing purposes. I found problems on both ends. These were mostly due to the use of the path in the GET command, without any expectation of it being a full URL, including the hostname. I've included a patch to fix the issue for GET commands, but there are likely problems with other HTTP commands, so probably the changes here should be factored out to fix all of them.

Attachments (3)

twisted-web-proxy.patch (2.6 KB) - added by ironfroggy 8 years ago.
fixes GET command for http protocol server and clients
proxy_take2.patch (1.8 KB) - added by ghazel 8 years ago.
client proxy patch
endpoints-doc.patch (1.3 KB) - added by exarkun 3 years ago.

Download all attachments as: .zip

Change History (31)

Changed 8 years ago by ironfroggy

fixes GET command for http protocol server and clients

comment:1 Changed 8 years ago by ironfroggy

  • Owner changed from jknight to dreid

comment:2 Changed 8 years ago by dreid

Ok, a couple of things.

The first patch strikes me as unecessary, or atleast not implemented in the right way or in the right place. The protocol doesn't care if it's connected to a proxy or not, it shouldn't be trying to figure that out, it should just send the request. I'd personally add arguments for specifying a proxy to the HTTPClientFactory and modify the behavior of HTTPClientFactory.setURL to make self.path absoluteURI if we're connecting to a proxy.

So then you could just do

d = getPage('http://google.com/', proxy='host:port') #notice how getPage passes kwargs onto HTTPClientFactory?
d.addCallback(_gotPage)

For the second part where the server doesn't deal with absolute URLS
Backport the code from twisted.web2.server.Request._parseURL lines 187-202

The 3rd patch violates the HTTP/1.0 spec RFC1945 Section 5.1.2

   The absoluteURI form is only allowed when the request is being made
   to a proxy. The proxy is requested to forward the request and return
   the response. If the request is GET or HEAD and a prior response is
   cached, the proxy may use the cached message if it passes any
   restrictions in the Expires header field. Note that the proxy may
   forward the request on to another proxy or directly to the server
   specified by the absoluteURI. In order to avoid request loops, a
   proxy must be able to recognize all of its server names, including
   any aliases, local variations, and the numeric IP address. An example
   Request-Line would be:

       GET http://www.w3.org/pub/WWW/TheProject.html HTTP/1.0

So absoluteURI should only ever be sent to a proxy. And the proxy should deconstruct the URI connect to the server, and send an absolute path. Which I believe is what happens now.

comment:3 Changed 8 years ago by dreid

Also, unittests.

comment:4 Changed 8 years ago by ironfroggy

  • Owner changed from dreid to ironfroggy

comment:5 Changed 8 years ago by exarkun

  • Milestone Twisted-2.5 deleted

comment:6 Changed 8 years ago by ghazel

  • Cc dreid added

Passing a proxy parameter straight through to HTTPClientFactory skips the important step of connecting to the proxy instead of the endpoint. I bet ironfroggy has a modified/replaced getPage we don't see here.
At that point, proxy becomes just a flag to HTTPClientFactory indicating it should use absoluteURI, and we don't need to inspect the transport (ow!).

I'm attaching a patch of my implementation, which only addresses the client issue.

Changed 8 years ago by ghazel

client proxy patch

comment:7 Changed 7 years ago by exarkun

  • Keywords review added; proxy removed
  • Owner ironfroggy deleted
  • Priority changed from low to highest

comment:8 Changed 7 years ago by dreid

  • Keywords review removed
  • Owner set to ghazel

No unit tests.

comment:9 follow-up: Changed 7 years ago by ghazel

Since there are no unit tests for the original behaviour, I have nothing to mimic.

How should a unit test for this functionality work? Should I install a third-party proxy server or client to use for the test?

Simply asserting the data contains the expected values or that it communicates with itself might have passed earlier, even though it did not correctly implement the protocol.

comment:10 Changed 6 years ago by exarkun

  • Keywords httpclient added

comment:11 in reply to: ↑ 9 Changed 5 years ago by thijs

  • Cc thijs added
  • Keywords review added

Replying to ghazel:

Since there are no unit tests for the original behaviour, I have nothing to mimic.

How should a unit test for this functionality work? Should I install a third-party proxy server or client to use for the test?

Simply asserting the data contains the expected values or that it communicates with itself might have passed earlier, even though it did not correctly implement the protocol.

Putting this in the review queue since nobody reviewed the patch or responded to ghazel's comments, which are required to move this ticket forward imo.

comment:12 Changed 5 years ago by exarkun

  • Cc exarkun added
  • Keywords review removed

A test for proxy support in Twisted Web should not rely on a third party HTTP proxy.

To test client proxy support, either set up a Twisted Web-based HTTP proxy server alongside the regular HTTP server already running in test_webclient and point the client tests at that, or, preferably, parameterize the reactor and catch the connection attempts and mock out the rest, rather than trying to talk to a real HTTP proxy.

The same ideas should apply to testing server proxy support.

Plus, the code should be factored so that much of it is unit testable at a fine resolution, rather than always exercising the entire HTTP client or server stack just to get at the proxy bits.

Just for what it's worth, implementing proxy support for the new HTTP client will probably be simpler and cleaner. I'm not opposed to having this feature in the old HTTP client, I just want to point out the better API in case anyone's not aware of it. See #886 and #3987.

comment:13 Changed 4 years ago by exarkun

#4704 was a duplicate.

comment:14 Changed 3 years ago by therve

  • Author set to therve
  • Branch set to branches/agent-proxy-1774

(In [31681]) Branching to 'agent-proxy-1774'

comment:15 Changed 3 years ago by therve

  • Keywords review added
  • Owner ghazel deleted

The attached branch had support for HTTP proxies in the new Agent. It's fairly trivial change I suppose. One question I had is whether to pass the proxy argument to the Agent constructor or to Agent.request. It seemed it matched the SSL factory as global setting, so I passed it to the constructor.

comment:16 Changed 3 years ago by therve

  • Keywords review removed
  • Owner set to therve
  • Priority changed from highest to normal

OK: ProxyAgent.

comment:17 Changed 3 years ago by therve

  • Keywords review added
  • Owner therve deleted

This is ready for review. I added a different ProxyAgent class, and used endpoints for the proxy parameter. Using composition didn't work because of the things modified (like the Host header). Testing is a bit light, but I wasn't heavily inspired.

comment:18 Changed 3 years ago by exarkun

  • Summary changed from HTTP Proxies inproperly handled both for client and server code in Twisted.Web to HTTP Proxies improperly handled both for client and server code in Twisted.Web

comment:19 Changed 3 years ago by exarkun

  • Owner set to exarkun
  • Status changed from new to assigned

comment:20 Changed 3 years ago by exarkun

  • Author changed from therve to exarkun, therve
  • Branch changed from branches/agent-proxy-1774 to branches/agent-proxy-1774-2

(In [32035]) Branching to 'agent-proxy-1774-2'

comment:21 Changed 3 years ago by exarkun

  • Author changed from exarkun, therve to therve
  • Keywords review removed
  • Owner changed from exarkun to therve
  • Status changed from assigned to new

Thanks for working on this. I merged the branch forward and resolved some simple conflicts with the other recent Agent-related work.

  1. ProxyAgentTests is missing some docstrings
  2. We probably don't need a third copy of that MemoryReactor/Clock class in test_webclient.py. How about deleting one of the existing definitions instead and making all of the tests use the same one?
  3. There's no test coverage for the definition of ProxyAgent._factory - ie, all the tests pass if I set it to int instead of _HTTP11ClientFactory.
  4. Since the coding standard has been updated, please use assertEqual instead of assertEquals ;)
  5. In the implementation, the request_path argument to _doRequest should be requestPath.
  6. _doRequest could use a better name as well... Maybe _connectAndRequest?
  7. Instead of duplicating the contextFactory default in Agent.__init__, how about taking and passing on **kw?

Thanks again!

comment:22 Changed 3 years ago by therve

  • Keywords review added
  • Owner therve deleted

Thanks for the merge forward and the review! Everything handled, I think.

comment:23 Changed 3 years ago by therve

I've handled a meta-review from IRC, refactoring the init methods, and adding @since tags

comment:24 Changed 3 years ago by exarkun

  • Keywords review removed
  • Owner set to therve

Code looks good, build results look good. I would like to see some docs added to doc/web/howto/client.xhtml for this though.

comment:25 Changed 3 years ago by exarkun

  • Summary changed from HTTP Proxies improperly handled both for client and server code in Twisted.Web to Support HTTP proxies with the twisted.web.client.Agent API

comment:26 Changed 3 years ago by therve

  • Keywords review added
  • Owner therve deleted

I write some doc, please give me feedback!

comment:27 Changed 3 years ago by exarkun

  • Keywords review removed
  • Owner set to therve

Thanks! Looks great. Attached is a patch to the doc with some minor fixes, apply if you like, then merge.

Changed 3 years ago by exarkun

comment:28 Changed 3 years ago by therve

  • Resolution set to fixed
  • Status changed from new to closed

(In [32149]) Merge agent-proxy-1774-2

Author: therve
Reviewer: exarkun
Fixes: #1774

Add proxy support to the Agent web client via the new ProxyAgent class.

Note: See TracTickets for help on using tickets.