Opened 10 years ago

Closed 6 months ago

#2625 defect closed fixed (fixed)

urlpath doesn't do any quoting

Reported by: itamarst Owned by: Matt
Priority: normal Milestone:
Component: core Keywords:
Cc: truekonrads, Jean-Paul Calderone, Dan, Tristan Seligmann, Matt Branch: branches/urlpath-quote-2625
branch-diff, diff-cov, branch-cov, buildbot
Author: truekonrads

Description

e.g.

>>> str(urlpath.URLPath().child("hello/ there"))
'http://localhost/hello/ there'

ought to be

>>> str(urlpath.URLPath().child("hello/ there"))
'http://localhost/hello%2f%20there'

Attachments (6)

2625.patch (1.4 KB) - added by truekonrads 8 years ago.
2625.2.patch (2.6 KB) - added by truekonrads 8 years ago.
Improved version that addresses exarkun's comments
2625.3.patch (8.4 KB) - added by Dan 7 years ago.
2625.4.patch (9.2 KB) - added by Dan 7 years ago.
urlpath-quote.patch (1.4 KB) - added by taos 6 years ago.
Added some new docstrings, removed str method in favor of toString(), not done unit tests yet.
2625.6.patch (6.9 KB) - added by Matt 4 years ago.

Download all attachments as: .zip

Change History (48)

comment:1 Changed 8 years ago by drake

Keywords: easy added

comment:2 Changed 8 years ago by Glyph

Owner: changed from Glyph to Itamar Turner-Trauring

comment:3 Changed 8 years ago by Glyph

Owner: changed from Itamar Turner-Trauring to itamarst

Changed 8 years ago by truekonrads

Attachment: 2625.patch added

comment:4 Changed 8 years ago by truekonrads

Cc: truekonrads added
Keywords: review added

I implemented the child() thing. Maybe we need to handle more stuff, like url in init? I am putting this up for review.

comment:5 Changed 8 years ago by Jean-Paul Calderone

Keywords: review removed
Owner: changed from itamarst to truekonrads

Thanks for the patch! Here are a few comments:

  1. URLPath.child should have a docstring. Can you add one? One thing it should mention, in particular, is that the path argument should not be quoted.
  2. Operators should have one space around them. For example, fixedPath=foo should be fixedPath = foo. Similarly, function arguments should be separated from each other by a ", ", not just a ",".
  3. You can make urllib.quote do all of the work for you by passing an empty string as the second argument. This way you won't have to use str.replace afterwards.
  4. test_childQuotes should also have a docstring.

Thanks again!

Changed 8 years ago by truekonrads

Attachment: 2625.2.patch added

Improved version that addresses exarkun's comments

comment:6 in reply to:  5 Changed 8 years ago by truekonrads

Keywords: review added

Replying to exarkun:

Done.

Additionally, I've replaced the default value of keepQuery=0 to keepQuery=False as it is a boolean condition anyway.

comment:7 Changed 8 years ago by truekonrads

Owner: truekonrads deleted

comment:8 Changed 8 years ago by Jean-Paul Calderone

Author: exarkun
Branch: branches/urlpath-quote-2625

(In [27252]) Branching to 'urlpath-quote-2625'

comment:9 Changed 8 years ago by Jean-Paul Calderone

(In [27253]) Apply 2625.2.patch

refs #2625

comment:10 Changed 8 years ago by Jean-Paul Calderone

(In [27254]) Clean up some docstring language; fix some minor coding standard violations

refs #2625

comment:11 Changed 8 years ago by Jean-Paul Calderone

Keywords: review removed
Owner: set to Jean-Paul Calderone
Status: newassigned

I made some minor changes to the new patch; mostly whitespace and docstring adjustments. Checking out [how it does on buildbot] now. Assuming it goes well, I'll merge.

comment:12 Changed 8 years ago by Jean-Paul Calderone

Author: exarkuntruekonrads
Cc: Jean-Paul Calderone added
Owner: changed from Jean-Paul Calderone to truekonrads
Status: assignednew

Ahem, how it does on buildbot

However, meh. I changed my mind. This is an incompatible change which will break any existing careful uses of URLPath.child. The existing method and behavior should be left alone, but deprecated. A new method should be introduced which has the desired semantics (which are now well implemented, I think, in the branch by the child method). Additionally, URLPath.sibling should probably receive similar treatment, and the rest of the methods checked for this problem.

In general, please refer to CompatibilityPolicy. This document isn't complete or finalized, but it's a good guideline nonetheless, and we should adhere to it as closely as possible.

comment:13 Changed 7 years ago by sah

Owner: changed from truekonrads to sah
Status: newassigned

comment:14 Changed 7 years ago by sah

Owner: changed from sah to truekonrads
Status: assignednew

Yikes, nevermind, reading comments I realized this involves too much policy for me right now.

Changed 7 years ago by Dan

Attachment: 2625.3.patch added

Changed 7 years ago by Dan

Attachment: 2625.4.patch added

comment:15 Changed 7 years ago by Dan

Cc: Dan added
Keywords: review added
Owner: truekonrads deleted

Building on the patches submitted by truekonrads, I made an attempt to address exarkun's compatibility concerns. Also included are some minor changes like test case names and boolean default arguments instead of ints.

In addition to URLPath.child, I provided quoting implementations of URLPath.sibling and URLPath.fromString. The names are the best I could come up with but I feel there's definite room for improvement there.

In unit testing the deprecated methods, I'm not sure if what I have is sufficient (counting the number of warnings seen) or if more detail is needed.

comment:16 Changed 7 years ago by truekonrads

I think that URLPath class with correct quoting should be called differently from old URLPath not added new methods. This comes to mind:

  • SafeURLPath
  • URLPath2
  • URLTrek
  • NewURLPath (which will be followed by ReallyNewThisTimeURLPath :) )

Usage of old URLPath should be phased out from Twisted and warn raised from init. P.S. I think this is funny how long it takes to close an easy ticket.

comment:17 Changed 7 years ago by Dan

Do you think this should be a new class in the same module or something like urlpath2.URLPath?

comment:18 in reply to:  17 Changed 7 years ago by truekonrads

Replying to danben:

Do you think this should be a new class in the same module or something like urlpath2.URLPath?

Yes, I think this is a nice solution - urlpath2.URLPath. I would suggest you also add DepreciationWarning to the urlpath module.

comment:19 Changed 7 years ago by Jean-Paul Calderone

Hang on a minute. Why is a new module a better idea than introducing new methods on the existing class?

comment:20 in reply to:  19 ; Changed 7 years ago by truekonrads

Replying to exarkun:

Hang on a minute. Why is a new module a better idea than introducing new methods on the existing class?

Because it is less confusing to the programmer - explorer who discovers a module, looks up its methods and applies them. He does not have to guess if he/she should use child() or safeChild() or newChild() or child(). urlpath2 vs urlpath immediately suggests to use the newer version.

comment:21 in reply to:  20 Changed 7 years ago by Dan

Replying to truekonrads:

Replying to exarkun:

Hang on a minute. Why is a new module a better idea than introducing new methods on the existing class?

Because it is less confusing to the programmer - explorer who discovers a module, looks up its methods and applies them. He does not have to guess if he/she should use child() or safeChild() or newChild() or child(). urlpath2 vs urlpath immediately suggests to use the newer version.

I agree with this, assuming that we want quoting to be the default behavior. One drawback that I can see is a surprise for users who upgrade from urlpath to urlpath2 without looking carefully at the documentation to see that the methods behave differently, but maybe this could be made explicit at the class-level documentation since URLPath is not a large class.

Anyway, I will defer to experience here. exarkun, let me know which approach you prefer and I'll make a patch based on that.

comment:22 Changed 7 years ago by Glyph

So, -1 on a new module. especially not "urlpath2".

I do think that making this a new class rather than a new collection of methods would be a good idea. I say this because it affects the behavior of the fromString, child, and sibling methods, and having a suffixed version of each method is somewhat verbose.

I suggest adding a new class, urlpath.URL, which has the right quoting behavior on all of its methods, and deprecating URLPath.

comment:23 Changed 7 years ago by truekonrads

Sounds like a solution. urlpath.URL + depreciationwarning has my vote.

comment:24 Changed 7 years ago by Tristan Seligmann

Cc: Tristan Seligmann added

comment:25 Changed 7 years ago by Jean-Paul Calderone

Keywords: review removed
Owner: set to truekonrads

So, to summarize:

I suggest that this start with just a model which can represent all of the information present in a URL (or IRI). Adding useful methods for manipulating the state can come afterwards. And eventually we'll need to have an API on Request for getting one of these new things, rather than an old URLPath. Then we can deprecate URLPath.

If people actually agree this is the way to go, then this ticket should be closed as wontfix and at least one new one opened describing the new thing that's going to be implemented.

comment:26 in reply to:  25 Changed 7 years ago by truekonrads

Resolution: wontfix
Status: newclosed

Replying to exarkun:

If people actually agree this is the way to go, then this ticket should be closed as wontfix and at least one new one opened describing the new thing that's going to be implemented.

Sounds like a decent plan. Will you do the honors then?

comment:27 Changed 7 years ago by Glyph

Resolution: wontfix
Status: closedreopened

Let's change the summary, not close as wontfix. The problem still exists: Twisted doesn't have a good abstraction for generating quoted URLs. Also, new tickets should be opened before old ones are closed, so we won't forget to do it.

In any event, I have another suggestion besides making a new class: __str__ is a bad way to get a canonicalized string format representation, we should really deprecate that. Why not just do the quoting at the point of stringification, rather than in the child() methods, and add a new, good method for stringification?

comment:28 Changed 6 years ago by Jean-Paul Calderone

Keywords: easy removed

Changed 6 years ago by taos

Attachment: urlpath-quote.patch added

Added some new docstrings, removed str method in favor of toString(), not done unit tests yet.

comment:29 Changed 6 years ago by Itamar Turner-Trauring

Unfortunately adding an alternative to __str__ isn't good enough, since e.g. sibling() takes what most users would expect to be an unquoted path, and then attaches it to quoted path elements.

comment:30 Changed 6 years ago by Jean-Paul Calderone

Let's change the summary, not close as wontfix.

I notice the summary was never changed. :)

comment:31 Changed 4 years ago by Itamar Turner-Trauring

I tried making new implementation. It's hard. Easier short term solution may be to just have two modes in URLPath, backwards compatible but deprecated mode, and correct quoting mode.

comment:32 Changed 4 years ago by Jean-Paul Calderone

What's hard?

comment:33 Changed 4 years ago by Itamar Turner-Trauring

Unicode (#5388). Bytes-only isn't hard (and implementation-wise may well be quite close to having two different modes on same class).

comment:34 in reply to:  33 Changed 4 years ago by Glyph

Replying to itamar:

Unicode (#5388). Bytes-only isn't hard (and implementation-wise may well be quite close to having two different modes on same class).

The API should just expose a new name so it's easy to know what "mode" - by which I mean "behavior" - your client code expects. The guts of the implementation isn't super interesting.

comment:35 Changed 4 years ago by Matt

Cc: Matt added

After chatting with itamar about this, we want to leave the unicode problem for #5388. So this ticket is *only* to properly quote things like '/' and ' ' (and other ascii characters that need quoting).

The implementation will involve accepting a flag passed to URLPath.init that will turn on quoting. Also, there will be a subclass of URLPath that has the flag turned on by default.

Changed 4 years ago by Matt

Attachment: 2625.6.patch added

comment:36 Changed 4 years ago by Matt

Keywords: review added

The patch I just added 2625.6.patch uses code from the urlpath-quote-2625 branch, but is based on trunk (instead of 3-year-old trunk)

comment:37 Changed 4 years ago by Itamar Turner-Trauring

Owner: truekonrads deleted
Status: reopenednew

comment:38 Changed 4 years ago by Jonathan Jacobs

Keywords: review removed
Owner: set to Matt

Thanks for updating the patch, magmatt!

  1. The first-person narrative docstring style is no longer preferred, please rephrase docstrings in this style to just talk about their behaviour without the narrative or first-person-ness.
  1. Would you add a nice helpful docstrings for URLPath.sibling, URLPath.parent and URLPath.here too, since their behaviour is changing in this patch too?
  1. It would be preferable if the docstrings in the tests could use the linking syntax, instead of the code literal syntax, when talking about methods. ie. C{child} becomes L{URLPath.child}.
  1. Please use assertEqual instead of assertEquals.
  1. (This is mostly just commentary.) It's unfortunate that passing an unquoted URL to URLPath.fromString is broken, regardless of whether you pass quote=True; I realise it is at least documented this way but it sure looks weird (maybe fromString is a misleading name compared to something like fromURIBytes or fromUnquotedString):
>>> str(URLPath.fromString('http://example.com/foo bar', quote=True))
'http://example.com/foo bar'
  1. Since there is now a docstring for URLPath.__init__, could you document the other parameters too?
  1. I don't think there is a need to redundantly document the default value of quote in URLPath.__init__.
  1. The docstring for URLPath itself should at least include a one-liner description of the class.
  1. Some docstrings and parameters are missing periods at the end of their sentences.
  1. Would you make sure there are two blank lines between methods where the code (or surrounding code) is being changed?

Sorry about all the nitpicking, I hope it doesn't deter you from getting this patch landed.

comment:39 in reply to:  38 Changed 4 years ago by Matt

Replying to jonathanj:

  1. The first-person narrative docstring style is no longer preferred, please rephrase docstrings in this style to just talk about their behaviour without the narrative or first-person-ness.

Have we always been at war with Eastasia? What is the preferred style (is there a wiki page or something I can read)?

Sorry about all the nitpicking, I hope it doesn't deter you from getting this patch landed.

Honestly, it does deter me. I feel like I'm playing Guess What the Reviewer Wants. I wish reviewers could easily make little fixes (and have them reviewed). Some reviews would be nicer as a patch than as English. But I suppose such a system would have downsides too.

I'll see if I can get to this (but it won't be today).

comment:40 in reply to:  38 Changed 4 years ago by Glyph

Since the issue of frustration came up, I figured I'd do a meta-review! :-)

My main point is: please separate suggestions and requirements into distinct sections so that the author has a clear idea of what they actually need to do to get it landed, versus the things they might want to consider, either for future tickets or at their own discretion before re-review.

Replying to jonathanj:

Thanks for updating the patch, magmatt!

  1. The first-person narrative docstring style is no longer preferred, please rephrase docstrings in this style to just talk about their behaviour without the narrative or first-person-ness.

This is useful as a suggestion, but despite the fact that this style isn't preferred, having documentation in this style is still better than not having documentation at all. Not mandatory.

  1. Would you add a nice helpful docstrings for URLPath.sibling, URLPath.parent and URLPath.here too, since their behaviour is changing in this patch too?

Changed behavior does require changed documentation (and changed tests) so this would be a requirement.

  1. It would be preferable if the docstrings in the tests could use the linking syntax, instead of the code literal syntax, when talking about methods. ie. C{child} becomes L{URLPath.child}.

Suggestion, but hopefully trivial enough that magmatt would want to do it :). (Also, for your information, L{child <URLPath.child>} will make the link appear as just the word "child" but still link to the right method.

  1. Please use assertEqual instead of assertEquals.

This should really come along with a citation. How did you know that this was correct?

  1. (This is mostly just commentary.) It's unfortunate that passing an unquoted URL to URLPath.fromString is broken, regardless of whether you pass quote=True; I realise it is at least documented this way but it sure looks weird (maybe fromString is a misleading name compared to something like fromURIBytes or fromUnquotedString):

"Mostly just commentary" is not an actionable statement :-). Should be in a separate "commentary" section.

  1. Since there is now a docstring for URLPath.__init__, could you document the other parameters too?

Doable as a separate ticket, since the parameters aren't new, right?

  1. I don't think there is a need to redundantly document the default value of quote in URLPath.__init__.

Definitely not a regression, so not necessary to deal with. Honestly, redundant documentation isn't such a bad thing. It's more work for the author but nice for the reader, since it gives them multiple angles to approach the documentation from.

  1. The docstring for URLPath itself should at least include a one-liner description of the class.

I am not so sure about this one.

  1. Some docstrings and parameters are missing periods at the end of their sentences.

Again, this is acceptable.

  1. Would you make sure there are two blank lines between methods where the code (or surrounding code) is being changed?

Despite being nitpicky, coding standard compliance is required (although if that's the only issue, the reviewer can make this change themselves).

(But this should also come with a reference citation...)

Sorry about all the nitpicking, I hope it doesn't deter you from getting this patch landed.

comment:41 Changed 4 years ago by Matt

In IRC exarkun pointed out that danben's patch (2625.4.patch) is probably a better direction (adding new methods that do quoting instead of adding a _quote attribute and attempting to change existing functions).

comment:42 Changed 6 months ago by Glyph

Resolution: fixed
Status: newclosed

This was fixed as a side-effect of #5388, in f8b827389fc346535afba5794fc72976f1235996.

Note: See TracTickets for help on using tickets.