Opened 11 years ago

Closed 6 years ago

#1774 defect closed fixed (fixed)

Support HTTP proxies with the twisted.web.client.Agent API

Reported by: Calvin Spealman Owned by: therve
Priority: normal Milestone:
Component: web Keywords: httpclient
Cc: David Reid, Thijs Triemstra, Jean-Paul Calderone Branch: branches/agent-proxy-1774-2
branch-diff, diff-cov, branch-cov, buildbot
Author: therve

Description

I had need of connecting through an HTTP proxy, and thus need of running my own proxy for testing purposes. I found problems on both ends. These were mostly due to the use of the path in the GET command, without any expectation of it being a full URL, including the hostname. I've included a patch to fix the issue for GET commands, but there are likely problems with other HTTP commands, so probably the changes here should be factored out to fix all of them.

Attachments (3)

twisted-web-proxy.patch (2.6 KB) - added by Calvin Spealman 11 years ago.
fixes GET command for http protocol server and clients
proxy_take2.patch (1.8 KB) - added by ghazel 10 years ago.
client proxy patch
endpoints-doc.patch (1.3 KB) - added by Jean-Paul Calderone 6 years ago.

Download all attachments as: .zip

Change History (31)

Changed 11 years ago by Calvin Spealman

Attachment: twisted-web-proxy.patch added

fixes GET command for http protocol server and clients

comment:1 Changed 11 years ago by Calvin Spealman

Owner: changed from jknight to David Reid

comment:2 Changed 11 years ago by David Reid

Ok, a couple of things.

The first patch strikes me as unecessary, or atleast not implemented in the right way or in the right place. The protocol doesn't care if it's connected to a proxy or not, it shouldn't be trying to figure that out, it should just send the request. I'd personally add arguments for specifying a proxy to the HTTPClientFactory and modify the behavior of HTTPClientFactory.setURL to make self.path absoluteURI if we're connecting to a proxy.

So then you could just do

d = getPage('http://google.com/', proxy='host:port') #notice how getPage passes kwargs onto HTTPClientFactory?
d.addCallback(_gotPage)

For the second part where the server doesn't deal with absolute URLS Backport the code from twisted.web2.server.Request._parseURL lines 187-202

The 3rd patch violates the HTTP/1.0 spec RFC1945 Section 5.1.2

   The absoluteURI form is only allowed when the request is being made
   to a proxy. The proxy is requested to forward the request and return
   the response. If the request is GET or HEAD and a prior response is
   cached, the proxy may use the cached message if it passes any
   restrictions in the Expires header field. Note that the proxy may
   forward the request on to another proxy or directly to the server
   specified by the absoluteURI. In order to avoid request loops, a
   proxy must be able to recognize all of its server names, including
   any aliases, local variations, and the numeric IP address. An example
   Request-Line would be:

       GET http://www.w3.org/pub/WWW/TheProject.html HTTP/1.0

So absoluteURI should only ever be sent to a proxy. And the proxy should deconstruct the URI connect to the server, and send an absolute path. Which I believe is what happens now.

comment:3 Changed 11 years ago by David Reid

Also, unittests.

comment:4 Changed 10 years ago by Calvin Spealman

Owner: changed from David Reid to Calvin Spealman

comment:5 Changed 10 years ago by Jean-Paul Calderone

Milestone: Twisted-2.5

comment:6 Changed 10 years ago by ghazel

Cc: David Reid added

Passing a proxy parameter straight through to HTTPClientFactory skips the important step of connecting to the proxy instead of the endpoint. I bet ironfroggy has a modified/replaced getPage we don't see here. At that point, proxy becomes just a flag to HTTPClientFactory indicating it should use absoluteURI, and we don't need to inspect the transport (ow!).

I'm attaching a patch of my implementation, which only addresses the client issue.

Changed 10 years ago by ghazel

Attachment: proxy_take2.patch added

client proxy patch

comment:7 Changed 9 years ago by Jean-Paul Calderone

Keywords: review added; proxy removed
Owner: Calvin Spealman deleted
Priority: lowhighest

comment:8 Changed 9 years ago by David Reid

Keywords: review removed
Owner: set to ghazel

No unit tests.

comment:9 Changed 9 years ago by ghazel

Since there are no unit tests for the original behaviour, I have nothing to mimic.

How should a unit test for this functionality work? Should I install a third-party proxy server or client to use for the test?

Simply asserting the data contains the expected values or that it communicates with itself might have passed earlier, even though it did not correctly implement the protocol.

comment:10 Changed 8 years ago by Jean-Paul Calderone

Keywords: httpclient added

comment:11 in reply to:  9 Changed 7 years ago by Thijs Triemstra

Cc: Thijs Triemstra added
Keywords: review added

Replying to ghazel:

Since there are no unit tests for the original behaviour, I have nothing to mimic.

How should a unit test for this functionality work? Should I install a third-party proxy server or client to use for the test?

Simply asserting the data contains the expected values or that it communicates with itself might have passed earlier, even though it did not correctly implement the protocol.

Putting this in the review queue since nobody reviewed the patch or responded to ghazel's comments, which are required to move this ticket forward imo.

comment:12 Changed 7 years ago by Jean-Paul Calderone

Cc: Jean-Paul Calderone added
Keywords: review removed

A test for proxy support in Twisted Web should not rely on a third party HTTP proxy.

To test client proxy support, either set up a Twisted Web-based HTTP proxy server alongside the regular HTTP server already running in test_webclient and point the client tests at that, or, preferably, parameterize the reactor and catch the connection attempts and mock out the rest, rather than trying to talk to a real HTTP proxy.

The same ideas should apply to testing server proxy support.

Plus, the code should be factored so that much of it is unit testable at a fine resolution, rather than always exercising the entire HTTP client or server stack just to get at the proxy bits.

Just for what it's worth, implementing proxy support for the new HTTP client will probably be simpler and cleaner. I'm not opposed to having this feature in the old HTTP client, I just want to point out the better API in case anyone's not aware of it. See #886 and #3987.

comment:13 Changed 6 years ago by Jean-Paul Calderone

#4704 was a duplicate.

comment:14 Changed 6 years ago by therve

Author: therve
Branch: branches/agent-proxy-1774

(In [31681]) Branching to 'agent-proxy-1774'

comment:15 Changed 6 years ago by therve

Keywords: review added
Owner: ghazel deleted

The attached branch had support for HTTP proxies in the new Agent. It's fairly trivial change I suppose. One question I had is whether to pass the proxy argument to the Agent constructor or to Agent.request. It seemed it matched the SSL factory as global setting, so I passed it to the constructor.

comment:16 Changed 6 years ago by therve

Keywords: review removed
Owner: set to therve
Priority: highestnormal

OK: ProxyAgent.

comment:17 Changed 6 years ago by therve

Keywords: review added
Owner: therve deleted

This is ready for review. I added a different ProxyAgent class, and used endpoints for the proxy parameter. Using composition didn't work because of the things modified (like the Host header). Testing is a bit light, but I wasn't heavily inspired.

comment:18 Changed 6 years ago by Jean-Paul Calderone

Summary: HTTP Proxies inproperly handled both for client and server code in Twisted.WebHTTP Proxies improperly handled both for client and server code in Twisted.Web

comment:19 Changed 6 years ago by Jean-Paul Calderone

Owner: set to Jean-Paul Calderone
Status: newassigned

comment:20 Changed 6 years ago by Jean-Paul Calderone

Author: therveexarkun, therve
Branch: branches/agent-proxy-1774branches/agent-proxy-1774-2

(In [32035]) Branching to 'agent-proxy-1774-2'

comment:21 Changed 6 years ago by Jean-Paul Calderone

Author: exarkun, thervetherve
Keywords: review removed
Owner: changed from Jean-Paul Calderone to therve
Status: assignednew

Thanks for working on this. I merged the branch forward and resolved some simple conflicts with the other recent Agent-related work.

  1. ProxyAgentTests is missing some docstrings
  2. We probably don't need a third copy of that MemoryReactor/Clock class in test_webclient.py. How about deleting one of the existing definitions instead and making all of the tests use the same one?
  3. There's no test coverage for the definition of ProxyAgent._factory - ie, all the tests pass if I set it to int instead of _HTTP11ClientFactory.
  4. Since the coding standard has been updated, please use assertEqual instead of assertEquals ;)
  5. In the implementation, the request_path argument to _doRequest should be requestPath.
  6. _doRequest could use a better name as well... Maybe _connectAndRequest?
  7. Instead of duplicating the contextFactory default in Agent.__init__, how about taking and passing on **kw?

Thanks again!

comment:22 Changed 6 years ago by therve

Keywords: review added
Owner: therve deleted

Thanks for the merge forward and the review! Everything handled, I think.

comment:23 Changed 6 years ago by therve

I've handled a meta-review from IRC, refactoring the init methods, and adding @since tags

comment:24 Changed 6 years ago by Jean-Paul Calderone

Keywords: review removed
Owner: set to therve

Code looks good, build results look good. I would like to see some docs added to doc/web/howto/client.xhtml for this though.

comment:25 Changed 6 years ago by Jean-Paul Calderone

Summary: HTTP Proxies improperly handled both for client and server code in Twisted.WebSupport HTTP proxies with the twisted.web.client.Agent API

comment:26 Changed 6 years ago by therve

Keywords: review added
Owner: therve deleted

I write some doc, please give me feedback!

comment:27 Changed 6 years ago by Jean-Paul Calderone

Keywords: review removed
Owner: set to therve

Thanks! Looks great. Attached is a patch to the doc with some minor fixes, apply if you like, then merge.

Changed 6 years ago by Jean-Paul Calderone

Attachment: endpoints-doc.patch added

comment:28 Changed 6 years ago by therve

Resolution: fixed
Status: newclosed

(In [32149]) Merge agent-proxy-1774-2

Author: therve Reviewer: exarkun Fixes: #1774

Add proxy support to the Agent web client via the new ProxyAgent class.

Note: See TracTickets for help on using tickets.