Opened 11 years ago
Closed 9 years ago
#5435 enhancement closed fixed (fixed)
twisted.web.client.RedirectAgent doesn't inform caller of final URI
Reported by: | Tom Most | Owned by: | Jonathan Jacobs |
---|---|---|---|
Priority: | normal | Milestone: | |
Component: | web | Keywords: | agent |
Cc: | jknight, Tom Most, therve, Jonathan Jacobs, Julian Berman | Branch: |
branches/redirectagent-final-uri-5435-2
branch-diff, diff-cov, branch-cov, buildbot |
Author: | jonathanj |
Description (last modified by )
If a redirection is encountered, RedirectAgent transparently follows it, but the ultimate URI of the response is not exposed. Not knowing the URI makes resolving relative URIs in the response impossible.
The obvious solution is to add the URI to the Response.
Change History (25)
comment:1 Changed 11 years ago by
Cc: | jknight added |
---|
comment:2 Changed 11 years ago by
Description: | modified (diff) |
---|
comment:3 Changed 11 years ago by
Cc: | Tom Most added |
---|
comment:4 Changed 11 years ago by
Type: | defect → enhancement |
---|
comment:5 Changed 10 years ago by
comment:6 follow-up: 9 Changed 9 years ago by
Cc: | therve added |
---|
There was some discussion on IRC about this, I'll try to summarize:
- We need to change the response, so that probably means changing IResponse. It may be considered a backward incompatible change, but if people are using things like proxyForInterface, it's transparent.
- Although, we shouldn't do that it very often, so we need to think of possible other use cases. One thing that was mentioned is including the whole Request object, so that you get other information set by the framework, like headers. This is interesting for other agents that change the request made by the user.
- The other bit mentioned is that we lose information about intermediate requests/response as well. This is a little bit more tricky to do in a generic fashion, but we could include a link to the previous response. I believe this could be useful for the caching agent, where you would get the cached response, but could get access to the 304 response (to be able to tell that the response is from the cache).
Other thoughts?
comment:7 Changed 9 years ago by
Tracking redirection history is also quite useful for external testing of web services, and in this use case it is also important to know the status code so that you can tell if the correct type of redirect was issued. I currently use the requests library for this task.
comment:8 Changed 9 years ago by
Author: | → jonathanj |
---|---|
Branch: | → branches/redirectagent-final-uri-5435 |
(In [37504]) Branching to 'redirectagent-final-uri-5435'
comment:9 Changed 9 years ago by
Cc: | Jonathan Jacobs added |
---|
Replying to therve:
There was some discussion on IRC about this, I'll try to summarize:
I'm pretty happy with these points, they would cover all the scenarios discussed on IRC.
One issue I ran into when I started working on this is the URI itself. Request.uri
is a relative URI and as far as I can tell it is not possible to reliably reconstruct the absolute URI from just a Request
. For example, the URL scheme is lost.
It would make more sense to store the absolute URI on Request
, since you can always work backwards from this. Going down this route probably involves changing t.w.c._parse
and t.w.c._URL
to not throw away important parts of the URI, but those are private APIs so presumably that's not a serious issue.
comment:10 Changed 9 years ago by
Cc: | Julian Berman added |
---|
comment:11 Changed 9 years ago by
comment:12 Changed 9 years ago by
comment:13 Changed 9 years ago by
Keywords: | agent review added |
---|
t.w.c.Response
now contains a reference to the t.w.c.Request
that resulted in the response, and a reference to the previous response in the case of RedirectAgent
. This involved reworking (and renaming) t.w.c._URL
into something that doesn't inherit from tuple
and provides enough information to construct an absolute URI. A convenience property Response.absoluteURI
is also provided.
comment:14 Changed 9 years ago by
comment:15 Changed 9 years ago by
Owner: | set to Tom Prince |
---|
comment:16 follow-up: 17 Changed 9 years ago by
Keywords: | review removed |
---|---|
Owner: | changed from Tom Prince to Jonathan Jacobs |
-
_URI
is effectively public, since instances are available as a public attribute. Thus, we should try to get the exposed interface right.- #2093 is probably relevant.
- Although, perhaps
.absoluteURI
is enough to expose? - Is this something that will eventually replace
URLPath
? If so, you should file a ticket to put it somewhere appropriate and make it public. If not, why this instead ofURLPath
.
- decorators are now allowed, although not required (for the various
property
things. - The topfile should make it clear that it is changing interfaces incompatibly.
- You should be able to use
successResultOf
, rather than adding callbacks with asserts and returning deferreds. - Should
.request
be aIRespone
instead ofRequest
? test_responseIncludesRequest
seem- Is the convenience api meant to be part of the interface? If not then using it means you need to know more than the documented interface of
Agent.request
(namely that it returns aResponse
not anIResponse
) and it will break with things likeGzipDecoder
. test_externalUnicodeInterference
seems bogus. strings are immutable, so calling functions on them should do anything. If it has something to do with caching in urlparse, you should mention that.
comment:17 Changed 9 years ago by
Keywords: | review added |
---|---|
Owner: | changed from Jonathan Jacobs to Tom Prince |
Thank you for your review, Tom!
Replying to tom.prince:
_URI
is effectively public, since instances are available as a public attribute. Thus, we should try to get the exposed interface right.
- #2093 is probably relevant.
- Although, perhaps
.absoluteURI
is enough to expose?- Is this something that will eventually replace
URLPath
? If so, you should file a ticket to put it somewhere appropriate and make it public. If not, why this instead ofURLPath
.
I opted for making Request.absoluteURI
the only public API.
- The topfile should make it clear that it is changing interfaces incompatibly.
I've changed it to talk about IResponse instead of Response.
- You should be able to use
successResultOf
, rather than adding callbacks with asserts and returning deferreds.
Yes! I love successResultOf
!
- Should
.request
be aIRespone
instead ofRequest
?
As discussed on IRC, iweb.IRequest
is actually the thing that is used with web Resource
s, which I'll admit is confusing.
test_responseIncludesRequest
seem
I've split the absolute URI part of this test into its own test.
- Is the convenience api meant to be part of the interface? If not then using it means you need to know more than the documented interface of
Agent.request
(namely that it returns aResponse
not anIResponse
) and it will break with things likeGzipDecoder
.
Whoops! Adding absoluteURI
to the interface totally slipped my mind.
test_externalUnicodeInterference
seems bogus. strings are immutable, so calling functions on them should do anything. If it has something to do with caching in urlparse, you should mention that.
I've removed this since it is a duplicate of a test by the same name for ParseUrlTestCase
.
comment:18 follow-up: 19 Changed 9 years ago by
Keywords: | review removed |
---|---|
Owner: | changed from Tom Prince to Jonathan Jacobs |
- Is
_parse
necesary now? One of the uses appears superflous: the result of one of the two uses seems unused, and the other can easily be replaced.- Regarding
test_externalUnicodeInterference
, #2628 where it was introduced seems to be indicating thaturlparse
used to use a cache, so it isn't useless, but the docstring should explain that, probably with a link to the ticket. (Since the_parse
tests are going away.
- Regarding
- There are some twisted checker errors.
- There is a build failure, that appears to be due to
Response.absoluteURI
failing if there isn't an associated request.
comment:19 Changed 9 years ago by
Keywords: | review added |
---|---|
Owner: | changed from Jonathan Jacobs to Tom Prince |
Replying to tom.prince:
- Is
_parse
necesary now? One of the uses appears superflous: the result of one of the two uses seems unused, and the other can easily be replaced.
- Regarding
test_externalUnicodeInterference
, #2628 where it was introduced seems to be indicating thaturlparse
used to use a cache, so it isn't useless, but the docstring should explain that, probably with a link to the ticket. (Since the_parse
tests are going away.
I think initially _parse
did more than it does now, I replaced/removed uses of _parse
. The docstring for test_externalUnicodeInterference
now also states what the purpose of the test is and contains a link to the ticket, as you suggested.
- There are some twisted checker errors.
- There is a build failure, that appears to be due to
Response.absoluteURI
failing if there isn't an associated request.
I fixed these up. Twisted checker still throws up warnings about missing @return
annotations in properties, since I think this is wrong I filed a bug and I'm ignoring these.
comment:20 Changed 9 years ago by
Owner: | changed from Tom Prince to Jean-Paul Calderone |
---|---|
Status: | new → assigned |
comment:21 follow-up: 22 Changed 9 years ago by
Keywords: | review removed |
---|---|
Owner: | changed from Jean-Paul Calderone to Jonathan Jacobs |
Status: | assigned → new |
Hi jonathanj. Thanks for tackling this one. Yet another brick in the wall of total world domination by Agent
.
I have a bunch of comments, sorry about that. A lot of them are related to the idea of cramming as much change into the interface change at one time as possible (an idea which I can relate to, and can see some of the merits of). I still can't help but feel (as we discussed on IRC earlier today) that if we could just solve the problem represented by wanting to change an existing interface, we'd be in a much better place. Alas, I still have no solution to propose to that problem, so I have a bunch of review comments instead. :/
A bunch of the other comments are about _URI
. One thing I didn't say down below is that it seems like _URI
could be worked on separately from this feature. In fact, I'm pretty sure there are some tickets for that. Since it 'is' mostly attributes, I guess I'm not going to push very hard to split the code out into multiple branches. But if you could follow up by finding some of those other tickets and updating them after this ticket is resolved, that'd be super great.
Alright, without further ado...
- twisted/web/iweb.py
- No one has really thought about the interface of
twisted.web._newclient.Request
so it seems like a bad idea to make it public asIResponse.request
.- For example, the
bodyProducer
andpersistent
attributes don't seem very important. ThebodyProducer
isn't re-usable and thepersistent
attribute actually reflects an aspect of the connection the request was issued over, not anything about the request itself. - There are also other randomly useless methods that happen to be public -
writeTo
andstopWriting
.
- For example, the
- I hope this isn't just bikeshedding, but
IResponse.response
seems like confusing naming to me. Responses don't have responses. Maybe "previous" belongs in that name somehow? Or some other word that disambiguates it a little bit from the response object you already have. - The type of
IResponse.absoluteURI
is undocumented but is pretty important. Also, is this really an important addition to the interface? It's an extra burden to each new implementation of the interface. Would it be bad to make applications get it from the request object (assumingIResponse.request
stays around; if goes awayIResponse.absoluteURI
becomes a bit more necessary).
- No one has really thought about the interface of
- twisted/web/test/test_webclient.py
- Use
assertEqual
, notassertEquals
- As long as you're rewriting all the parsing tests, one assertion per test method please (or put another way, one *case* per test method).
- For example,
test_setURL
seems like it is really two tests - And
assertEqual((a, b, c), (d, e, f))
produces a better failure than three differentassertEqual
calls, one for each pairwise comparison. (Of course,assertEqual(expectedURLObject, resultURLObject)
would be even better, but we have no URL object yet I guess).
- For example,
- Use
- doc/web/howto/client.xhtml
- Not sure these changes will all remain relevant, but in case they do, the API link for the
request
attribute goes to theResponse
class. - Something about the
absoluteURI
response attribute? This doesn't necessarily belong in the section about redirects, since it's generally useful. How about somewhere in the "Receiving Responses" section?
- Not sure these changes will all remain relevant, but in case they do, the API link for the
- twisted/web/test/test_agent.py
- Can you use more descriptive variable names than "res2" and "req2" in
test_responseHistory
?
- Can you use more descriptive variable names than "res2" and "req2" in
- twisted/web/client.py
_URI
- The comprehensive documentation in the class docstring should be moved to the
__init__
docstring instead. http://twistedmatrix.com/documents/current/core/development/policy/coding-standard.html#auto9 - As a general point of design, it might be nice if
url
,scheme
,host
,port
, andpath
onHTTPClientFactory
were properties computed in terms of a new attribute which held the_URI
instance. As a practical matter, I don't care aboutHTTPClientFactory
and I don't think it's worth the effort to follow through with this idea. - Do you have any feelings about
_URI.fromBytes
or_URI.parseFromBytes
instead of_URI.parse
? - You might as well use
@classmethod
for class methods; this patch already uses@property
elsewhere. _URI.uri
has the same flavor asIResponse.response
. Too bad the RFC insists that the byte representation of the thing is called a "URI". I don't have a suggestion for a better name, except to make it a method instead and call it something liketoBytes
.- Can you de-emphasize parsing in the class docstring? Sure, you often parse a string to get one of these things, but once you have it, what is it? This could just be a reference to the RFC if you want.
- The comment in
originForm
mentions the HTTP RFC. Can you make this a bit more precise? Which section of which HTTP RFC? And a link to the section in the docstring (on a@see
perhaps) might not go amiss.
- The comprehensive documentation in the class docstring should be moved to the
_chainResponse
sets an attribute on the request object. This kind of external control makes state changes on an object somewhat surprising. One secondresponse
isNone
, the next it's not, how'd that happen? Not really a catastrophe, but I think it's a pattern to avoid (Twisted Conch in particular is guilty of this approach and it really damages the understandability of the code). I admit, none of the options I can think of for doing this differently are terribly attractive. Given here for completeness, but don't take that as a recommendation:- Introduce a new private method on
Response
and call that, instead. - Introduce a new public method on
Response
and call that, instead. - Introduce a new method on
IResponse
with an implementation onResponse
and call that, instead.
- Introduce a new private method on
- twisted/web/_newclient.py
- Naming the parameter
_parsedURL
instead ofparsedURL
basically just makes passing it by keyword not part of the public API. Passing it positionally is still part of the public API, though. Is this what you intended? Compared to thepersistent
parameter, this one hardly seems like a problem (supposing we actually wantRequest
to be public... which... eh)
- Naming the parameter
- Generally:
- The code (mostly in the tests) is inconsistent with its use of the
b''
syntax. For example, URL strings are sometimes written withb''
and sometimes with''
. - There is a plan to make all of the new, currently-optional attributes mandatory on all of the objects, right? I'd like to see the plan documented somewhere. Probably as a series of tickets.
- The code (mostly in the tests) is inconsistent with its use of the
Again, thanks for your work here. I hope all of the above makes sense. Apologies for any parts that don't.
comment:22 Changed 9 years ago by
Keywords: | review added |
---|---|
Owner: | changed from Jonathan Jacobs to Jean-Paul Calderone |
Replying to exarkun:
Thank you for your review!
A bunch of the other comments are about _URI. One thing I didn't say down below is that it seems like _URI could be worked on separately from this feature. In fact, I'm pretty sure there are some tickets for that. Since it 'is' mostly attributes, I guess I'm not going to push very hard to split the code out into multiple branches. But if you could follow up by finding some of those other tickets and updating them after this ticket is resolved, that'd be super great.
I made a wiki page and tracked down as many tickets as I could find: URLsInTwisted.
- twisted/web/iweb.py
I introduced a new interface IClientRequest
(since IRequest
is used the server request, that doesn't quite work the same way), which contains none of the implementation details that IRequest
and Request
have, that is exposed as IResponse.request
.
- doc/web/howto/client.xhtml
- Not sure these changes will all remain relevant, but in case they do, the API link for the
request
attribute goes to theResponse
class.- Something about the
absoluteURI
response attribute? This doesn't necessarily belong in the section about redirects, since it's generally useful. How about somewhere in the "Receiving Responses" section?
I updated the documentation with these suggestions in mind, although I found introducing the concept of absoluteURI
and request
somewhat awkward. I'd appreciate some feedback for these changes.
_chainResponse
sets an attribute on the request object. This kind of external control makes state changes on an object somewhat surprising. One secondresponse
isNone
, the next it's not, how'd that happen? Not really a catastrophe, but I think it's a pattern to avoid (Twisted Conch in particular is guilty of this approach and it really damages the understandability of the code). I admit, none of the options I can think of for doing this differently are terribly attractive. Given here for completeness, but don't take that as a recommendation:
- Introduce a new private method on
Response
and call that, instead.- Introduce a new public method on
Response
and call that, instead.- Introduce a new method on
IResponse
with an implementation onResponse
and call that, instead.
Our discussion determined that a public method, because of the way things like the gzip decoding were implemented, was the way to go.
- twisted/web/_newclient.py
- Naming the parameter
_parsedURL
instead ofparsedURL
basically just makes passing it by keyword not part of the public API. Passing it positionally is still part of the public API, though. Is this what you intended? Compared to thepersistent
parameter, this one hardly seems like a problem (supposing we actually wantRequest
to be public... which... eh)
After our discussion on IRC, I introduced _construct
private constructors for the objects where __init__
was previously modified for the sole purpose of passing private values to the object.
- Generally:
- The code (mostly in the tests) is inconsistent with its use of the
b''
syntax. For example, URL strings are sometimes written withb''
and sometimes with''
.- There is a plan to make all of the new, currently-optional attributes mandatory on all of the objects, right? I'd like to see the plan documented somewhere. Probably as a series of tickets.
I filed #6447 and #6448 as the parts of the plan for this goal.
comment:23 Changed 9 years ago by
Keywords: | review removed |
---|---|
Owner: | changed from Jean-Paul Calderone to Jonathan Jacobs |
- Both
_construct
should document their parameters (although most of them can just be references to the corresponding documentation on__init__
). - The howto needs to be update to refer to
previousResposne
. 3. You seem to have lost the test regarding urlparse and unicode handling.- This causes a pyflakes error.
- (minor) In
URITests
, I'd be inclined to avoid parsing uri's twice. (test_path
etc) - There is a python3 failure.
Please merge, after addressing the above point.
Note: I mentioned something about breaking treq on IRC, but it turns out that I was depending on something (#5434) that got merged after your branch.
comment:24 Changed 9 years ago by
Branch: | branches/redirectagent-final-uri-5435 → branches/redirectagent-final-uri-5435-2 |
---|
(In [38793]) Branching to 'redirectagent-final-uri-5435-2'
comment:25 Changed 9 years ago by
Resolution: | → fixed |
---|---|
Status: | new → closed |
(In [38812]) Merge redirectagent-final-uri-5435-2.
Author: jonathanj Reviewer: tom.prince, exarkun Fixes: #5435
Introduce twisted.web.iweb.IResponse.request, the IClientRequest (a cleaned up interface for Request that does not expose all the implementation details publicly) that caused the response to exist, and twisted.web.iweb.IResponse.previousResponse, the previous IResponse from a redirect. The absolute URI for a request is available as IClientRequest.absoluteURI, or for a response as IResponse.request.absoluteURI.
#6193 was a duplicate of this.