Opened 6 years ago

Closed 4 years ago

#5435 enhancement closed fixed (fixed)

twisted.web.client.RedirectAgent doesn't inform caller of final URI

Reported by: Tom Most Owned by: Jonathan Jacobs
Priority: normal Milestone:
Component: web Keywords: agent
Cc: jknight, Tom Most, therve, Jonathan Jacobs, Julian Berman Branch: branches/redirectagent-final-uri-5435-2
branch-diff, diff-cov, branch-cov, buildbot
Author: jonathanj

Description (last modified by Jean-Paul Calderone)

If a redirection is encountered, RedirectAgent transparently follows it, but the ultimate URI of the response is not exposed. Not knowing the URI makes resolving relative URIs in the response impossible.

The obvious solution is to add the URI to the Response.

Change History (25)

comment:1 Changed 6 years ago by DefaultCC Plugin

Cc: jknight added

comment:2 Changed 6 years ago by Jean-Paul Calderone

Description: modified (diff)

comment:3 Changed 6 years ago by Tom Most

Cc: Tom Most added

comment:4 Changed 6 years ago by Jean-Paul Calderone

Type: defectenhancement

comment:5 Changed 5 years ago by Jean-Paul Calderone

#6193 was a duplicate of this.

comment:6 Changed 5 years ago by therve

Cc: therve added

There was some discussion on IRC about this, I'll try to summarize:

  • We need to change the response, so that probably means changing IResponse. It may be considered a backward incompatible change, but if people are using things like proxyForInterface, it's transparent.
  • Although, we shouldn't do that it very often, so we need to think of possible other use cases. One thing that was mentioned is including the whole Request object, so that you get other information set by the framework, like headers. This is interesting for other agents that change the request made by the user.
  • The other bit mentioned is that we lose information about intermediate requests/response as well. This is a little bit more tricky to do in a generic fashion, but we could include a link to the previous response. I believe this could be useful for the caching agent, where you would get the cached response, but could get access to the 304 response (to be able to tell that the response is from the cache).

Other thoughts?

comment:7 Changed 5 years ago by Tom Most

Tracking redirection history is also quite useful for external testing of web services, and in this use case it is also important to know the status code so that you can tell if the correct type of redirect was issued. I currently use the requests library for this task.

comment:8 Changed 5 years ago by Jonathan Jacobs

Author: jonathanj
Branch: branches/redirectagent-final-uri-5435

(In [37504]) Branching to 'redirectagent-final-uri-5435'

comment:9 in reply to:  6 Changed 5 years ago by Jonathan Jacobs

Cc: Jonathan Jacobs added

Replying to therve:

There was some discussion on IRC about this, I'll try to summarize:

I'm pretty happy with these points, they would cover all the scenarios discussed on IRC.

One issue I ran into when I started working on this is the URI itself. Request.uri is a relative URI and as far as I can tell it is not possible to reliably reconstruct the absolute URI from just a Request. For example, the URL scheme is lost.

It would make more sense to store the absolute URI on Request, since you can always work backwards from this. Going down this route probably involves changing t.w.c._parse and t.w.c._URL to not throw away important parts of the URI, but those are private APIs so presumably that's not a serious issue.

comment:10 Changed 5 years ago by Julian Berman

Cc: Julian Berman added

comment:11 Changed 5 years ago by Jonathan Jacobs

(In [37544]) Rename and improve t.w.c._URL and store an instance, for retrieving the absolute URI, on Request instances. refs #5435

comment:12 Changed 5 years ago by Jonathan Jacobs

(In [37545]) Rename and improve t.w.c._URL and store an instance, for retrieving the absolute URI, on Request instances. refs #5435

comment:13 Changed 5 years ago by Jonathan Jacobs

Keywords: agent review added

t.w.c.Response now contains a reference to the t.w.c.Request that resulted in the response, and a reference to the previous response in the case of RedirectAgent. This involved reworking (and renaming) t.w.c._URL into something that doesn't inherit from tuple and provides enough information to construct an absolute URI. A convenience property Response.absoluteURI is also provided.

comment:14 Changed 5 years ago by Jonathan Jacobs

(In [37568]) Add some documentation for the response history with RedirectAgent, refs #5435.

comment:15 Changed 4 years ago by Jonathan Jacobs

Owner: set to Tom Prince

comment:16 Changed 4 years ago by Tom Prince

Keywords: review removed
Owner: changed from Tom Prince to Jonathan Jacobs
  1. _URI is effectively public, since instances are available as a public attribute. Thus, we should try to get the exposed interface right.
    • #2093 is probably relevant.
    • Although, perhaps .absoluteURI is enough to expose?
    • Is this something that will eventually replace URLPath? If so, you should file a ticket to put it somewhere appropriate and make it public. If not, why this instead of URLPath.
  2. decorators are now allowed, although not required (for the various property things.
  3. The topfile should make it clear that it is changing interfaces incompatibly.
  4. You should be able to use successResultOf, rather than adding callbacks with asserts and returning deferreds.
  5. Should .request be a IRespone instead of Request?
  6. test_responseIncludesRequest seem
  7. Is the convenience api meant to be part of the interface? If not then using it means you need to know more than the documented interface of Agent.request (namely that it returns a Response not an IResponse) and it will break with things like GzipDecoder.
  8. test_externalUnicodeInterference seems bogus. strings are immutable, so calling functions on them should do anything. If it has something to do with caching in urlparse, you should mention that.

comment:17 in reply to:  16 Changed 4 years ago by Jonathan Jacobs

Keywords: review added
Owner: changed from Jonathan Jacobs to Tom Prince

Thank you for your review, Tom!

Replying to tom.prince:

  1. _URI is effectively public, since instances are available as a public attribute. Thus, we should try to get the exposed interface right.
    • #2093 is probably relevant.
    • Although, perhaps .absoluteURI is enough to expose?
    • Is this something that will eventually replace URLPath? If so, you should file a ticket to put it somewhere appropriate and make it public. If not, why this instead of URLPath.

I opted for making Request.absoluteURI the only public API.

  1. The topfile should make it clear that it is changing interfaces incompatibly.

I've changed it to talk about IResponse instead of Response.

  1. You should be able to use successResultOf, rather than adding callbacks with asserts and returning deferreds.

Yes! I love successResultOf!

  1. Should .request be a IRespone instead of Request?

As discussed on IRC, iweb.IRequest is actually the thing that is used with web Resources, which I'll admit is confusing.

  1. test_responseIncludesRequest seem

I've split the absolute URI part of this test into its own test.

  1. Is the convenience api meant to be part of the interface? If not then using it means you need to know more than the documented interface of Agent.request (namely that it returns a Response not an IResponse) and it will break with things like GzipDecoder.

Whoops! Adding absoluteURI to the interface totally slipped my mind.

  1. test_externalUnicodeInterference seems bogus. strings are immutable, so calling functions on them should do anything. If it has something to do with caching in urlparse, you should mention that.

I've removed this since it is a duplicate of a test by the same name for ParseUrlTestCase.

comment:18 Changed 4 years ago by Tom Prince

Keywords: review removed
Owner: changed from Tom Prince to Jonathan Jacobs
  1. Is _parse necesary now? One of the uses appears superflous: the result of one of the two uses seems unused, and the other can easily be replaced.
    • Regarding test_externalUnicodeInterference, #2628 where it was introduced seems to be indicating that urlparse used to use a cache, so it isn't useless, but the docstring should explain that, probably with a link to the ticket. (Since the _parse tests are going away.
  2. There are some twisted checker errors.
  3. There is a build failure, that appears to be due to Response.absoluteURI failing if there isn't an associated request.

build results

comment:19 in reply to:  18 Changed 4 years ago by Jonathan Jacobs

Keywords: review added
Owner: changed from Jonathan Jacobs to Tom Prince

Replying to tom.prince:

  1. Is _parse necesary now? One of the uses appears superflous: the result of one of the two uses seems unused, and the other can easily be replaced.
    • Regarding test_externalUnicodeInterference, #2628 where it was introduced seems to be indicating that urlparse used to use a cache, so it isn't useless, but the docstring should explain that, probably with a link to the ticket. (Since the _parse tests are going away.

I think initially _parse did more than it does now, I replaced/removed uses of _parse. The docstring for test_externalUnicodeInterference now also states what the purpose of the test is and contains a link to the ticket, as you suggested.

  1. There are some twisted checker errors.
  2. There is a build failure, that appears to be due to Response.absoluteURI failing if there isn't an associated request.

I fixed these up. Twisted checker still throws up warnings about missing @return annotations in properties, since I think this is wrong I filed a bug and I'm ignoring these.

build results

comment:20 Changed 4 years ago by Jean-Paul Calderone

Owner: changed from Tom Prince to Jean-Paul Calderone
Status: newassigned

comment:21 Changed 4 years ago by Jean-Paul Calderone

Keywords: review removed
Owner: changed from Jean-Paul Calderone to Jonathan Jacobs
Status: assignednew

Hi jonathanj. Thanks for tackling this one. Yet another brick in the wall of total world domination by Agent.

I have a bunch of comments, sorry about that. A lot of them are related to the idea of cramming as much change into the interface change at one time as possible (an idea which I can relate to, and can see some of the merits of). I still can't help but feel (as we discussed on IRC earlier today) that if we could just solve the problem represented by wanting to change an existing interface, we'd be in a much better place. Alas, I still have no solution to propose to that problem, so I have a bunch of review comments instead. :/

A bunch of the other comments are about _URI. One thing I didn't say down below is that it seems like _URI could be worked on separately from this feature. In fact, I'm pretty sure there are some tickets for that. Since it 'is' mostly attributes, I guess I'm not going to push very hard to split the code out into multiple branches. But if you could follow up by finding some of those other tickets and updating them after this ticket is resolved, that'd be super great.

Alright, without further ado...

  1. twisted/web/iweb.py
    1. No one has really thought about the interface of twisted.web._newclient.Request so it seems like a bad idea to make it public as IResponse.request.
      1. For example, the bodyProducer and persistent attributes don't seem very important. The bodyProducer isn't re-usable and the persistent attribute actually reflects an aspect of the connection the request was issued over, not anything about the request itself.
      2. There are also other randomly useless methods that happen to be public - writeTo and stopWriting.
    2. I hope this isn't just bikeshedding, but IResponse.response seems like confusing naming to me. Responses don't have responses. Maybe "previous" belongs in that name somehow? Or some other word that disambiguates it a little bit from the response object you already have.
    3. The type of IResponse.absoluteURI is undocumented but is pretty important. Also, is this really an important addition to the interface? It's an extra burden to each new implementation of the interface. Would it be bad to make applications get it from the request object (assuming IResponse.request stays around; if goes away IResponse.absoluteURI becomes a bit more necessary).
  2. twisted/web/test/test_webclient.py
    1. Use assertEqual, not assertEquals
    2. As long as you're rewriting all the parsing tests, one assertion per test method please (or put another way, one *case* per test method).
      1. For example, test_setURL seems like it is really two tests
      2. And assertEqual((a, b, c), (d, e, f)) produces a better failure than three different assertEqual calls, one for each pairwise comparison. (Of course, assertEqual(expectedURLObject, resultURLObject) would be even better, but we have no URL object yet I guess).
  3. doc/web/howto/client.xhtml
    1. Not sure these changes will all remain relevant, but in case they do, the API link for the request attribute goes to the Response class.
    2. Something about the absoluteURI response attribute? This doesn't necessarily belong in the section about redirects, since it's generally useful. How about somewhere in the "Receiving Responses" section?
  4. twisted/web/test/test_agent.py
    1. Can you use more descriptive variable names than "res2" and "req2" in test_responseHistory?
  5. twisted/web/client.py
    1. _URI
      1. The comprehensive documentation in the class docstring should be moved to the __init__ docstring instead. http://twistedmatrix.com/documents/current/core/development/policy/coding-standard.html#auto9
      2. As a general point of design, it might be nice if url, scheme, host, port, and path on HTTPClientFactory were properties computed in terms of a new attribute which held the _URI instance. As a practical matter, I don't care about HTTPClientFactory and I don't think it's worth the effort to follow through with this idea.
      3. Do you have any feelings about _URI.fromBytes or _URI.parseFromBytes instead of _URI.parse?
      4. You might as well use @classmethod for class methods; this patch already uses @property elsewhere.
      5. _URI.uri has the same flavor as IResponse.response. Too bad the RFC insists that the byte representation of the thing is called a "URI". I don't have a suggestion for a better name, except to make it a method instead and call it something like toBytes.
      6. Can you de-emphasize parsing in the class docstring? Sure, you often parse a string to get one of these things, but once you have it, what is it? This could just be a reference to the RFC if you want.
      7. The comment in originForm mentions the HTTP RFC. Can you make this a bit more precise? Which section of which HTTP RFC? And a link to the section in the docstring (on a @see perhaps) might not go amiss.
    2. _chainResponse sets an attribute on the request object. This kind of external control makes state changes on an object somewhat surprising. One second response is None, the next it's not, how'd that happen? Not really a catastrophe, but I think it's a pattern to avoid (Twisted Conch in particular is guilty of this approach and it really damages the understandability of the code). I admit, none of the options I can think of for doing this differently are terribly attractive. Given here for completeness, but don't take that as a recommendation:
      1. Introduce a new private method on Response and call that, instead.
      2. Introduce a new public method on Response and call that, instead.
      3. Introduce a new method on IResponse with an implementation on Response and call that, instead.
  6. twisted/web/_newclient.py
    1. Naming the parameter _parsedURL instead of parsedURL basically just makes passing it by keyword not part of the public API. Passing it positionally is still part of the public API, though. Is this what you intended? Compared to the persistent parameter, this one hardly seems like a problem (supposing we actually want Request to be public... which... eh)
  7. Generally:
    1. The code (mostly in the tests) is inconsistent with its use of the b'' syntax. For example, URL strings are sometimes written with b'' and sometimes with ''.
    2. There is a plan to make all of the new, currently-optional attributes mandatory on all of the objects, right? I'd like to see the plan documented somewhere. Probably as a series of tickets.

Again, thanks for your work here. I hope all of the above makes sense. Apologies for any parts that don't.

comment:22 in reply to:  21 Changed 4 years ago by Jonathan Jacobs

Keywords: review added
Owner: changed from Jonathan Jacobs to Jean-Paul Calderone

Replying to exarkun:

Thank you for your review!

A bunch of the other comments are about _URI. One thing I didn't say down below is that it seems like _URI could be worked on separately from this feature. In fact, I'm pretty sure there are some tickets for that. Since it 'is' mostly attributes, I guess I'm not going to push very hard to split the code out into multiple branches. But if you could follow up by finding some of those other tickets and updating them after this ticket is resolved, that'd be super great.

I made a wiki page and tracked down as many tickets as I could find: URLsInTwisted.

  1. twisted/web/iweb.py

I introduced a new interface IClientRequest (since IRequest is used the server request, that doesn't quite work the same way), which contains none of the implementation details that IRequest and Request have, that is exposed as IResponse.request.

  1. doc/web/howto/client.xhtml
    1. Not sure these changes will all remain relevant, but in case they do, the API link for the request attribute goes to the Response class.
    2. Something about the absoluteURI response attribute? This doesn't necessarily belong in the section about redirects, since it's generally useful. How about somewhere in the "Receiving Responses" section?

I updated the documentation with these suggestions in mind, although I found introducing the concept of absoluteURI and request somewhat awkward. I'd appreciate some feedback for these changes.

  1. _chainResponse sets an attribute on the request object. This kind of external control makes state changes on an object somewhat surprising. One second response is None, the next it's not, how'd that happen? Not really a catastrophe, but I think it's a pattern to avoid (Twisted Conch in particular is guilty of this approach and it really damages the understandability of the code). I admit, none of the options I can think of for doing this differently are terribly attractive. Given here for completeness, but don't take that as a recommendation:
    1. Introduce a new private method on Response and call that, instead.
    2. Introduce a new public method on Response and call that, instead.
    3. Introduce a new method on IResponse with an implementation on Response and call that, instead.

Our discussion determined that a public method, because of the way things like the gzip decoding were implemented, was the way to go.

  1. twisted/web/_newclient.py
    1. Naming the parameter _parsedURL instead of parsedURL basically just makes passing it by keyword not part of the public API. Passing it positionally is still part of the public API, though. Is this what you intended? Compared to the persistent parameter, this one hardly seems like a problem (supposing we actually want Request to be public... which... eh)

After our discussion on IRC, I introduced _construct private constructors for the objects where __init__ was previously modified for the sole purpose of passing private values to the object.

  1. Generally:
    1. The code (mostly in the tests) is inconsistent with its use of the b'' syntax. For example, URL strings are sometimes written with b'' and sometimes with ''.
    2. There is a plan to make all of the new, currently-optional attributes mandatory on all of the objects, right? I'd like to see the plan documented somewhere. Probably as a series of tickets.

I filed #6447 and #6448 as the parts of the plan for this goal.

comment:23 Changed 4 years ago by Tom Prince

Keywords: review removed
Owner: changed from Jean-Paul Calderone to Jonathan Jacobs
  1. Both _construct should document their parameters (although most of them can just be references to the corresponding documentation on __init__).
  2. The howto needs to be update to refer to previousResposne. 3. You seem to have lost the test regarding urlparse and unicode handling.
    • This causes a pyflakes error.
  3. (minor) In URITests, I'd be inclined to avoid parsing uri's twice. (test_path etc)
  4. There is a python3 failure.

Please merge, after addressing the above point.

Note: I mentioned something about breaking treq on IRC, but it turns out that I was depending on something (#5434) that got merged after your branch.

comment:24 Changed 4 years ago by Jonathan Jacobs

Branch: branches/redirectagent-final-uri-5435branches/redirectagent-final-uri-5435-2

(In [38793]) Branching to 'redirectagent-final-uri-5435-2'

comment:25 Changed 4 years ago by Jonathan Jacobs

Resolution: fixed
Status: newclosed

(In [38812]) Merge redirectagent-final-uri-5435-2.

Author: jonathanj Reviewer: tom.prince, exarkun Fixes: #5435

Introduce twisted.web.iweb.IResponse.request, the IClientRequest (a cleaned up interface for Request that does not expose all the implementation details publicly) that caused the response to exist, and twisted.web.iweb.IResponse.previousResponse, the previous IResponse from a redirect. The absolute URI for a request is available as IClientRequest.absoluteURI, or for a response as IResponse.request.absoluteURI.

Note: See TracTickets for help on using tickets.