Opened 15 years ago
Closed 5 years ago
#2625 defect closed fixed (fixed)
urlpath doesn't do any quoting
Reported by: | itamarst | Owned by: | Matt |
---|---|---|---|
Priority: | normal | Milestone: | |
Component: | core | Keywords: | |
Cc: | truekonrads, Jean-Paul Calderone, Dan, Tristan Seligmann, Matt | Branch: |
branches/urlpath-quote-2625
branch-diff, diff-cov, branch-cov, buildbot |
Author: | truekonrads |
Description
e.g.
>>> str(urlpath.URLPath().child("hello/ there")) 'http://localhost/hello/ there'
ought to be
>>> str(urlpath.URLPath().child("hello/ there")) 'http://localhost/hello%2f%20there'
Attachments (6)
Change History (48)
comment:1 Changed 13 years ago by
Keywords: | easy added |
---|
comment:2 Changed 13 years ago by
Owner: | changed from Glyph to Itamar Turner-Trauring |
---|
comment:3 Changed 13 years ago by
Owner: | changed from Itamar Turner-Trauring to itamarst |
---|
Changed 13 years ago by
Attachment: | 2625.patch added |
---|
comment:4 Changed 13 years ago by
Cc: | truekonrads added |
---|---|
Keywords: | review added |
comment:5 follow-up: 6 Changed 13 years ago by
Keywords: | review removed |
---|---|
Owner: | changed from itamarst to truekonrads |
Thanks for the patch! Here are a few comments:
URLPath.child
should have a docstring. Can you add one? One thing it should mention, in particular, is that the path argument should not be quoted.- Operators should have one space around them. For example,
fixedPath=foo
should befixedPath = foo
. Similarly, function arguments should be separated from each other by a ", ", not just a ",". - You can make
urllib.quote
do all of the work for you by passing an empty string as the second argument. This way you won't have to usestr.replace
afterwards. test_childQuotes
should also have a docstring.
Thanks again!
Changed 13 years ago by
Attachment: | 2625.2.patch added |
---|
Improved version that addresses exarkun's comments
comment:6 Changed 13 years ago by
Keywords: | review added |
---|
Replying to exarkun:
Done.
Additionally, I've replaced the default value of keepQuery=0 to keepQuery=False as it is a boolean condition anyway.
comment:7 Changed 13 years ago by
Owner: | truekonrads deleted |
---|
comment:8 Changed 13 years ago by
Author: | → exarkun |
---|---|
Branch: | → branches/urlpath-quote-2625 |
(In [27252]) Branching to 'urlpath-quote-2625'
comment:10 Changed 13 years ago by
comment:11 Changed 13 years ago by
Keywords: | review removed |
---|---|
Owner: | set to Jean-Paul Calderone |
Status: | new → assigned |
I made some minor changes to the new patch; mostly whitespace and docstring adjustments. Checking out [how it does on buildbot] now. Assuming it goes well, I'll merge.
comment:12 Changed 13 years ago by
Author: | exarkun → truekonrads |
---|---|
Cc: | Jean-Paul Calderone added |
Owner: | changed from Jean-Paul Calderone to truekonrads |
Status: | assigned → new |
Ahem, how it does on buildbot
However, meh. I changed my mind. This is an incompatible change which will break any existing careful uses of URLPath.child
. The existing method and behavior should be left alone, but deprecated. A new method should be introduced which has the desired semantics (which are now well implemented, I think, in the branch by the child
method). Additionally, URLPath.sibling
should probably receive similar treatment, and the rest of the methods checked for this problem.
In general, please refer to CompatibilityPolicy. This document isn't complete or finalized, but it's a good guideline nonetheless, and we should adhere to it as closely as possible.
comment:13 Changed 12 years ago by
Owner: | changed from truekonrads to sah |
---|---|
Status: | new → assigned |
comment:14 Changed 12 years ago by
Owner: | changed from sah to truekonrads |
---|---|
Status: | assigned → new |
Yikes, nevermind, reading comments I realized this involves too much policy for me right now.
Changed 12 years ago by
Attachment: | 2625.3.patch added |
---|
Changed 12 years ago by
Attachment: | 2625.4.patch added |
---|
comment:15 Changed 12 years ago by
Cc: | Dan added |
---|---|
Keywords: | review added |
Owner: | truekonrads deleted |
Building on the patches submitted by truekonrads, I made an attempt to address exarkun's compatibility concerns. Also included are some minor changes like test case names and boolean default arguments instead of ints.
In addition to URLPath.child
, I provided quoting implementations of URLPath.sibling
and URLPath.fromString
. The names are the best I could come up with but I feel there's definite room for improvement there.
In unit testing the deprecated methods, I'm not sure if what I have is sufficient (counting the number of warnings seen) or if more detail is needed.
comment:16 Changed 12 years ago by
I think that URLPath class with correct quoting should be called differently from old URLPath not added new methods. This comes to mind:
- SafeURLPath
- URLPath2
- URLTrek
- NewURLPath (which will be followed by ReallyNewThisTimeURLPath :) )
Usage of old URLPath should be phased out from Twisted and warn raised from init. P.S. I think this is funny how long it takes to close an easy ticket.
comment:17 follow-up: 18 Changed 12 years ago by
Do you think this should be a new class in the same module or something like urlpath2.URLPath
?
comment:18 Changed 12 years ago by
Replying to danben:
Do you think this should be a new class in the same module or something like
urlpath2.URLPath
?
Yes, I think this is a nice solution - urlpath2.URLPath. I would suggest you also add DepreciationWarning to the urlpath module.
comment:19 follow-up: 20 Changed 12 years ago by
Hang on a minute. Why is a new module a better idea than introducing new methods on the existing class?
comment:20 follow-up: 21 Changed 12 years ago by
Replying to exarkun:
Hang on a minute. Why is a new module a better idea than introducing new methods on the existing class?
Because it is less confusing to the programmer - explorer who discovers a module, looks up its methods and applies them. He does not have to guess if he/she should use child() or safeChild() or newChild() or child(). urlpath2 vs urlpath immediately suggests to use the newer version.
comment:21 Changed 12 years ago by
Replying to truekonrads:
Replying to exarkun:
Hang on a minute. Why is a new module a better idea than introducing new methods on the existing class?
Because it is less confusing to the programmer - explorer who discovers a module, looks up its methods and applies them. He does not have to guess if he/she should use child() or safeChild() or newChild() or child(). urlpath2 vs urlpath immediately suggests to use the newer version.
I agree with this, assuming that we want quoting to be the default behavior. One drawback that I can see is a surprise for users who upgrade from urlpath
to urlpath2
without looking carefully at the documentation to see that the methods behave differently, but maybe this could be made explicit at the class-level documentation since URLPath
is not a large class.
Anyway, I will defer to experience here. exarkun, let me know which approach you prefer and I'll make a patch based on that.
comment:22 Changed 12 years ago by
So, -1 on a new module. especially not "urlpath2".
I do think that making this a new class rather than a new collection of methods would be a good idea. I say this because it affects the behavior of the fromString
, child
, and sibling
methods, and having a suffixed version of each method is somewhat verbose.
I suggest adding a new class, urlpath.URL
, which has the right quoting behavior on all of its methods, and deprecating URLPath
.
comment:23 Changed 12 years ago by
Sounds like a solution. urlpath.URL + depreciationwarning has my vote.
comment:24 Changed 12 years ago by
Cc: | Tristan Seligmann added |
---|
comment:25 follow-up: 26 Changed 12 years ago by
Keywords: | review removed |
---|---|
Owner: | set to truekonrads |
So, to summarize:
- We don't want to fix
URLPath
- We want to introduce a new API which doesn't have the bugs
URLPath
has- Quoting should be handled in a way that promotes secure usage
- All 4 parts of the netloc should be parsed
- Query arguments should be easily inspected and manipulated
- unicode should be supported
- Both absolute and relative URLs should be supportable
I suggest that this start with just a model which can represent all of the information present in a URL (or IRI). Adding useful methods for manipulating the state can come afterwards. And eventually we'll need to have an API on Request
for getting one of these new things, rather than an old URLPath
. Then we can deprecate URLPath
.
If people actually agree this is the way to go, then this ticket should be closed as wontfix and at least one new one opened describing the new thing that's going to be implemented.
comment:26 Changed 12 years ago by
Resolution: | → wontfix |
---|---|
Status: | new → closed |
Replying to exarkun:
If people actually agree this is the way to go, then this ticket should be closed as wontfix and at least one new one opened describing the new thing that's going to be implemented.
Sounds like a decent plan. Will you do the honors then?
comment:27 Changed 12 years ago by
Resolution: | wontfix |
---|---|
Status: | closed → reopened |
Let's change the summary, not close as wontfix. The problem still exists: Twisted doesn't have a good abstraction for generating quoted URLs. Also, new tickets should be opened before old ones are closed, so we won't forget to do it.
In any event, I have another suggestion besides making a new class: __str__
is a bad way to get a canonicalized string format representation, we should really deprecate that. Why not just do the quoting at the point of stringification, rather than in the child()
methods, and add a new, good method for stringification?
comment:28 Changed 11 years ago by
Keywords: | easy removed |
---|
Changed 11 years ago by
Attachment: | urlpath-quote.patch added |
---|
Added some new docstrings, removed str method in favor of toString(), not done unit tests yet.
comment:29 Changed 11 years ago by
Unfortunately adding an alternative to __str__
isn't good enough, since e.g. sibling()
takes what most users would expect to be an unquoted path, and then attaches it to quoted path elements.
comment:30 Changed 11 years ago by
Let's change the summary, not close as wontfix.
I notice the summary was never changed. :)
comment:31 Changed 9 years ago by
I tried making new implementation. It's hard. Easier short term solution may be to just have two modes in URLPath, backwards compatible but deprecated mode, and correct quoting mode.
comment:33 follow-up: 34 Changed 9 years ago by
Unicode (#5388). Bytes-only isn't hard (and implementation-wise may well be quite close to having two different modes on same class).
comment:34 Changed 9 years ago by
Replying to itamar:
Unicode (#5388). Bytes-only isn't hard (and implementation-wise may well be quite close to having two different modes on same class).
The API should just expose a new name so it's easy to know what "mode" - by which I mean "behavior" - your client code expects. The guts of the implementation isn't super interesting.
comment:35 Changed 9 years ago by
Cc: | Matt added |
---|
After chatting with itamar about this, we want to leave the unicode problem for #5388. So this ticket is *only* to properly quote things like '/' and ' ' (and other ascii characters that need quoting).
The implementation will involve accepting a flag passed to URLPath.init that will turn on quoting. Also, there will be a subclass of URLPath that has the flag turned on by default.
Changed 9 years ago by
Attachment: | 2625.6.patch added |
---|
comment:36 Changed 9 years ago by
Keywords: | review added |
---|
The patch I just added 2625.6.patch uses code from the urlpath-quote-2625 branch, but is based on trunk (instead of 3-year-old trunk)
comment:37 Changed 9 years ago by
Owner: | truekonrads deleted |
---|---|
Status: | reopened → new |
comment:38 follow-ups: 39 40 Changed 9 years ago by
Keywords: | review removed |
---|---|
Owner: | set to Matt |
Thanks for updating the patch, magmatt!
- The first-person narrative docstring style is no longer preferred, please rephrase docstrings in this style to just talk about their behaviour without the narrative or first-person-ness.
- Would you add a nice helpful docstrings for
URLPath.sibling
,URLPath.parent
andURLPath.here
too, since their behaviour is changing in this patch too?
- It would be preferable if the docstrings in the tests could use the linking syntax, instead of the code literal syntax, when talking about methods. ie.
C{child}
becomesL{URLPath.child}
.
- Please use
assertEqual
instead ofassertEquals
.
- (This is mostly just commentary.) It's unfortunate that passing an unquoted URL to
URLPath.fromString
is broken, regardless of whether you passquote=True
; I realise it is at least documented this way but it sure looks weird (maybefromString
is a misleading name compared to something likefromURIBytes
orfromUnquotedString
):
>>> str(URLPath.fromString('http://example.com/foo bar', quote=True)) 'http://example.com/foo bar'
- Since there is now a docstring for
URLPath.__init__
, could you document the other parameters too?
- I don't think there is a need to redundantly document the default value of
quote
inURLPath.__init__
.
- The docstring for
URLPath
itself should at least include a one-liner description of the class.
- Some docstrings and parameters are missing periods at the end of their sentences.
- Would you make sure there are two blank lines between methods where the code (or surrounding code) is being changed?
Sorry about all the nitpicking, I hope it doesn't deter you from getting this patch landed.
comment:39 Changed 9 years ago by
Replying to jonathanj:
- The first-person narrative docstring style is no longer preferred, please rephrase docstrings in this style to just talk about their behaviour without the narrative or first-person-ness.
Have we always been at war with Eastasia? What is the preferred style (is there a wiki page or something I can read)?
Sorry about all the nitpicking, I hope it doesn't deter you from getting this patch landed.
Honestly, it does deter me. I feel like I'm playing Guess What the Reviewer Wants. I wish reviewers could easily make little fixes (and have them reviewed). Some reviews would be nicer as a patch than as English. But I suppose such a system would have downsides too.
I'll see if I can get to this (but it won't be today).
comment:40 Changed 9 years ago by
Since the issue of frustration came up, I figured I'd do a meta-review! :-)
My main point is: please separate suggestions and requirements into distinct sections so that the author has a clear idea of what they actually need to do to get it landed, versus the things they might want to consider, either for future tickets or at their own discretion before re-review.
Replying to jonathanj:
Thanks for updating the patch, magmatt!
- The first-person narrative docstring style is no longer preferred, please rephrase docstrings in this style to just talk about their behaviour without the narrative or first-person-ness.
This is useful as a suggestion, but despite the fact that this style isn't preferred, having documentation in this style is still better than not having documentation at all. Not mandatory.
- Would you add a nice helpful docstrings for
URLPath.sibling
,URLPath.parent
andURLPath.here
too, since their behaviour is changing in this patch too?
Changed behavior does require changed documentation (and changed tests) so this would be a requirement.
- It would be preferable if the docstrings in the tests could use the linking syntax, instead of the code literal syntax, when talking about methods. ie.
C{child}
becomesL{URLPath.child}
.
Suggestion, but hopefully trivial enough that magmatt would want to do it :). (Also, for your information, L{child <URLPath.child>}
will make the link appear as just the word "child" but still link to the right method.
- Please use
assertEqual
instead ofassertEquals
.
This should really come along with a citation. How did you know that this was correct?
- (This is mostly just commentary.) It's unfortunate that passing an unquoted URL to
URLPath.fromString
is broken, regardless of whether you passquote=True
; I realise it is at least documented this way but it sure looks weird (maybefromString
is a misleading name compared to something likefromURIBytes
orfromUnquotedString
):
"Mostly just commentary" is not an actionable statement :-). Should be in a separate "commentary" section.
- Since there is now a docstring for
URLPath.__init__
, could you document the other parameters too?
Doable as a separate ticket, since the parameters aren't new, right?
- I don't think there is a need to redundantly document the default value of
quote
inURLPath.__init__
.
Definitely not a regression, so not necessary to deal with. Honestly, redundant documentation isn't such a bad thing. It's more work for the author but nice for the reader, since it gives them multiple angles to approach the documentation from.
- The docstring for
URLPath
itself should at least include a one-liner description of the class.
I am not so sure about this one.
- Some docstrings and parameters are missing periods at the end of their sentences.
Again, this is acceptable.
- Would you make sure there are two blank lines between methods where the code (or surrounding code) is being changed?
Despite being nitpicky, coding standard compliance is required (although if that's the only issue, the reviewer can make this change themselves).
(But this should also come with a reference citation...)
Sorry about all the nitpicking, I hope it doesn't deter you from getting this patch landed.
comment:41 Changed 9 years ago by
In IRC exarkun pointed out that danben's patch (2625.4.patch) is probably a better direction (adding new methods that do quoting instead of adding a _quote attribute and attempting to change existing functions).
comment:42 Changed 5 years ago by
Resolution: | → fixed |
---|---|
Status: | new → closed |
This was fixed as a side-effect of #5388, in f8b827389fc346535afba5794fc72976f1235996.
I implemented the child() thing. Maybe we need to handle more stuff, like url in init? I am putting this up for review.