Opened 7 years ago

Last modified 12 months ago

#2625 defect new

urlpath doesn't do any quoting

Reported by: itamarst Owned by: magmatt
Priority: normal Milestone:
Component: core Keywords:
Cc: truekonrads, exarkun, danben@…, mithrandi@…, haggardii@… Branch: branches/urlpath-quote-2625
(diff, github, buildbot, log)
Author: truekonrads Launchpad Bug:

Description

e.g.

>>> str(urlpath.URLPath().child("hello/ there"))
'http://localhost/hello/ there'

ought to be

>>> str(urlpath.URLPath().child("hello/ there"))
'http://localhost/hello%2f%20there'

Attachments (6)

2625.patch (1.4 KB) - added by truekonrads 5 years ago.
2625.2.patch (2.6 KB) - added by truekonrads 5 years ago.
Improved version that addresses exarkun's comments
2625.3.patch (8.4 KB) - added by danben 4 years ago.
2625.4.patch (9.2 KB) - added by danben 4 years ago.
urlpath-quote.patch (1.4 KB) - added by taos 4 years ago.
Added some new docstrings, removed str method in favor of toString(), not done unit tests yet.
2625.6.patch (6.9 KB) - added by magmatt 14 months ago.

Download all attachments as: .zip

Change History (47)

comment:1 Changed 5 years ago by drake

  • Keywords easy added

comment:2 Changed 5 years ago by glyph

  • Owner changed from glyph to itamar

comment:3 Changed 5 years ago by glyph

  • Owner changed from itamar to itamarst

Changed 5 years ago by truekonrads

comment:4 Changed 5 years ago by truekonrads

  • Cc truekonrads added
  • Keywords review added

I implemented the child() thing. Maybe we need to handle more stuff, like url in init?
I am putting this up for review.

comment:5 follow-up: Changed 5 years ago by exarkun

  • Keywords review removed
  • Owner changed from itamarst to truekonrads

Thanks for the patch! Here are a few comments:

  1. URLPath.child should have a docstring. Can you add one? One thing it should mention, in particular, is that the path argument should not be quoted.
  2. Operators should have one space around them. For example, fixedPath=foo should be fixedPath = foo. Similarly, function arguments should be separated from each other by a ", ", not just a ",".
  3. You can make urllib.quote do all of the work for you by passing an empty string as the second argument. This way you won't have to use str.replace afterwards.
  4. test_childQuotes should also have a docstring.

Thanks again!

Changed 5 years ago by truekonrads

Improved version that addresses exarkun's comments

comment:6 in reply to: ↑ 5 Changed 5 years ago by truekonrads

  • Keywords review added

Replying to exarkun:

Done.

Additionally, I've replaced the default value of keepQuery=0 to keepQuery=False as it is a boolean condition anyway.

comment:7 Changed 5 years ago by truekonrads

  • Owner truekonrads deleted

comment:8 Changed 5 years ago by exarkun

  • Author set to exarkun
  • Branch set to branches/urlpath-quote-2625

(In [27252]) Branching to 'urlpath-quote-2625'

comment:9 Changed 5 years ago by exarkun

(In [27253]) Apply 2625.2.patch

refs #2625

comment:10 Changed 5 years ago by exarkun

(In [27254]) Clean up some docstring language; fix some minor coding standard violations

refs #2625

comment:11 Changed 5 years ago by exarkun

  • Keywords review removed
  • Owner set to exarkun
  • Status changed from new to assigned

I made some minor changes to the new patch; mostly whitespace and docstring adjustments. Checking out [how it does on buildbot] now. Assuming it goes well, I'll merge.

comment:12 Changed 5 years ago by exarkun

  • Author changed from exarkun to truekonrads
  • Cc exarkun added
  • Owner changed from exarkun to truekonrads
  • Status changed from assigned to new

Ahem, how it does on buildbot

However, meh. I changed my mind. This is an incompatible change which will break any existing careful uses of URLPath.child. The existing method and behavior should be left alone, but deprecated. A new method should be introduced which has the desired semantics (which are now well implemented, I think, in the branch by the child method). Additionally, URLPath.sibling should probably receive similar treatment, and the rest of the methods checked for this problem.

In general, please refer to CompatibilityPolicy. This document isn't complete or finalized, but it's a good guideline nonetheless, and we should adhere to it as closely as possible.

comment:13 Changed 5 years ago by sah

  • Owner changed from truekonrads to sah
  • Status changed from new to assigned

comment:14 Changed 5 years ago by sah

  • Owner changed from sah to truekonrads
  • Status changed from assigned to new

Yikes, nevermind, reading comments I realized this involves too much policy for me right now.

Changed 4 years ago by danben

Changed 4 years ago by danben

comment:15 Changed 4 years ago by danben

  • Cc danben@… added
  • Keywords review added
  • Owner truekonrads deleted

Building on the patches submitted by truekonrads, I made an attempt to address exarkun's compatibility concerns. Also included are some minor changes like test case names and boolean default arguments instead of ints.

In addition to URLPath.child, I provided quoting implementations of URLPath.sibling and URLPath.fromString. The names are the best I could come up with but I feel there's definite room for improvement there.

In unit testing the deprecated methods, I'm not sure if what I have is sufficient (counting the number of warnings seen) or if more detail is needed.

comment:16 Changed 4 years ago by truekonrads

I think that URLPath class with correct quoting should be called differently from old URLPath not added new methods. This comes to mind:

  • SafeURLPath
  • URLPath2
  • URLTrek
  • NewURLPath (which will be followed by ReallyNewThisTimeURLPath :) )

Usage of old URLPath should be phased out from Twisted and warn raised from init.
P.S. I think this is funny how long it takes to close an easy ticket.

comment:17 follow-up: Changed 4 years ago by danben

Do you think this should be a new class in the same module or something like urlpath2.URLPath?

comment:18 in reply to: ↑ 17 Changed 4 years ago by truekonrads

Replying to danben:

Do you think this should be a new class in the same module or something like urlpath2.URLPath?

Yes, I think this is a nice solution - urlpath2.URLPath. I would suggest you also add DepreciationWarning to the urlpath module.

comment:19 follow-up: Changed 4 years ago by exarkun

Hang on a minute. Why is a new module a better idea than introducing new methods on the existing class?

comment:20 in reply to: ↑ 19 ; follow-up: Changed 4 years ago by truekonrads

Replying to exarkun:

Hang on a minute. Why is a new module a better idea than introducing new methods on the existing class?

Because it is less confusing to the programmer - explorer who discovers a module, looks up its methods and applies them. He does not have to guess if he/she should use child() or safeChild() or newChild() or child(). urlpath2 vs urlpath immediately suggests to use the newer version.

comment:21 in reply to: ↑ 20 Changed 4 years ago by danben

Replying to truekonrads:

Replying to exarkun:

Hang on a minute. Why is a new module a better idea than introducing new methods on the existing class?

Because it is less confusing to the programmer - explorer who discovers a module, looks up its methods and applies them. He does not have to guess if he/she should use child() or safeChild() or newChild() or child(). urlpath2 vs urlpath immediately suggests to use the newer version.

I agree with this, assuming that we want quoting to be the default behavior. One drawback that I can see is a surprise for users who upgrade from urlpath to urlpath2 without looking carefully at the documentation to see that the methods behave differently, but maybe this could be made explicit at the class-level documentation since URLPath is not a large class.

Anyway, I will defer to experience here. exarkun, let me know which approach you prefer and I'll make a patch based on that.

comment:22 Changed 4 years ago by glyph

So, -1 on a new module. especially not "urlpath2".

I do think that making this a new class rather than a new collection of methods would be a good idea. I say this because it affects the behavior of the fromString, child, and sibling methods, and having a suffixed version of each method is somewhat verbose.

I suggest adding a new class, urlpath.URL, which has the right quoting behavior on all of its methods, and deprecating URLPath.

comment:23 Changed 4 years ago by truekonrads

Sounds like a solution. urlpath.URL + depreciationwarning has my vote.

comment:24 Changed 4 years ago by mithrandi

  • Cc mithrandi@… added

comment:25 follow-up: Changed 4 years ago by exarkun

  • Keywords review removed
  • Owner set to truekonrads

So, to summarize:

I suggest that this start with just a model which can represent all of the information present in a URL (or IRI). Adding useful methods for manipulating the state can come afterwards. And eventually we'll need to have an API on Request for getting one of these new things, rather than an old URLPath. Then we can deprecate URLPath.

If people actually agree this is the way to go, then this ticket should be closed as wontfix and at least one new one opened describing the new thing that's going to be implemented.

comment:26 in reply to: ↑ 25 Changed 4 years ago by truekonrads

  • Resolution set to wontfix
  • Status changed from new to closed

Replying to exarkun:

If people actually agree this is the way to go, then this ticket should be closed as wontfix and at least one new one opened describing the new thing that's going to be implemented.

Sounds like a decent plan. Will you do the honors then?

comment:27 Changed 4 years ago by glyph

  • Resolution wontfix deleted
  • Status changed from closed to reopened

Let's change the summary, not close as wontfix. The problem still exists: Twisted doesn't have a good abstraction for generating quoted URLs. Also, new tickets should be opened before old ones are closed, so we won't forget to do it.

In any event, I have another suggestion besides making a new class: __str__ is a bad way to get a canonicalized string format representation, we should really deprecate that. Why not just do the quoting at the point of stringification, rather than in the child() methods, and add a new, good method for stringification?

comment:28 Changed 4 years ago by exarkun

  • Keywords easy removed

Changed 4 years ago by taos

Added some new docstrings, removed str method in favor of toString(), not done unit tests yet.

comment:29 Changed 3 years ago by itamar

Unfortunately adding an alternative to __str__ isn't good enough, since e.g. sibling() takes what most users would expect to be an unquoted path, and then attaches it to quoted path elements.

comment:30 Changed 3 years ago by exarkun

Let's change the summary, not close as wontfix.

I notice the summary was never changed. :)

comment:31 Changed 20 months ago by itamar

I tried making new implementation. It's hard. Easier short term solution may be to just have two modes in URLPath, backwards compatible but deprecated mode, and correct quoting mode.

comment:32 Changed 20 months ago by exarkun

What's hard?

comment:33 follow-up: Changed 20 months ago by itamar

Unicode (#5388). Bytes-only isn't hard (and implementation-wise may well be quite close to having two different modes on same class).

comment:34 in reply to: ↑ 33 Changed 20 months ago by glyph

Replying to itamar:

Unicode (#5388). Bytes-only isn't hard (and implementation-wise may well be quite close to having two different modes on same class).

The API should just expose a new name so it's easy to know what "mode" - by which I mean "behavior" - your client code expects. The guts of the implementation isn't super interesting.

comment:35 Changed 14 months ago by magmatt

  • Cc haggardii@… added

After chatting with itamar about this, we want to leave the unicode problem for #5388. So this ticket is *only* to properly quote things like '/' and ' ' (and other ascii characters that need quoting).

The implementation will involve accepting a flag passed to URLPath.init that will turn on quoting. Also, there will be a subclass of URLPath that has the flag turned on by default.

Changed 14 months ago by magmatt

comment:36 Changed 14 months ago by magmatt

  • Keywords review added

The patch I just added 2625.6.patch uses code from the urlpath-quote-2625 branch, but is based on trunk (instead of 3-year-old trunk)

comment:37 Changed 14 months ago by itamar

  • Owner truekonrads deleted
  • Status changed from reopened to new

comment:38 follow-ups: Changed 13 months ago by jonathanj

  • Keywords review removed
  • Owner set to magmatt

Thanks for updating the patch, magmatt!

  1. The first-person narrative docstring style is no longer preferred, please rephrase docstrings in this style to just talk about their behaviour without the narrative or first-person-ness.
  1. Would you add a nice helpful docstrings for URLPath.sibling, URLPath.parent and URLPath.here too, since their behaviour is changing in this patch too?
  1. It would be preferable if the docstrings in the tests could use the linking syntax, instead of the code literal syntax, when talking about methods. ie. C{child} becomes L{URLPath.child}.
  1. Please use assertEqual instead of assertEquals.
  1. (This is mostly just commentary.) It's unfortunate that passing an unquoted URL to URLPath.fromString is broken, regardless of whether you pass quote=True; I realise it is at least documented this way but it sure looks weird (maybe fromString is a misleading name compared to something like fromURIBytes or fromUnquotedString):
>>> str(URLPath.fromString('http://example.com/foo bar', quote=True))
'http://example.com/foo bar'
  1. Since there is now a docstring for URLPath.__init__, could you document the other parameters too?
  1. I don't think there is a need to redundantly document the default value of quote in URLPath.__init__.
  1. The docstring for URLPath itself should at least include a one-liner description of the class.
  1. Some docstrings and parameters are missing periods at the end of their sentences.
  1. Would you make sure there are two blank lines between methods where the code (or surrounding code) is being changed?

Sorry about all the nitpicking, I hope it doesn't deter you from getting this patch landed.

comment:39 in reply to: ↑ 38 Changed 13 months ago by magmatt

Replying to jonathanj:

  1. The first-person narrative docstring style is no longer preferred, please rephrase docstrings in this style to just talk about their behaviour without the narrative or first-person-ness.

Have we always been at war with Eastasia? What is the preferred style (is there a wiki page or something I can read)?

Sorry about all the nitpicking, I hope it doesn't deter you from getting this patch landed.

Honestly, it does deter me. I feel like I'm playing Guess What the Reviewer Wants. I wish reviewers could easily make little fixes (and have them reviewed). Some reviews would be nicer as a patch than as English. But I suppose such a system would have downsides too.

I'll see if I can get to this (but it won't be today).

comment:40 in reply to: ↑ 38 Changed 13 months ago by glyph

Since the issue of frustration came up, I figured I'd do a meta-review! :-)

My main point is: please separate suggestions and requirements into distinct sections so that the author has a clear idea of what they actually need to do to get it landed, versus the things they might want to consider, either for future tickets or at their own discretion before re-review.

Replying to jonathanj:

Thanks for updating the patch, magmatt!

  1. The first-person narrative docstring style is no longer preferred, please rephrase docstrings in this style to just talk about their behaviour without the narrative or first-person-ness.

This is useful as a suggestion, but despite the fact that this style isn't preferred, having documentation in this style is still better than not having documentation at all. Not mandatory.

  1. Would you add a nice helpful docstrings for URLPath.sibling, URLPath.parent and URLPath.here too, since their behaviour is changing in this patch too?

Changed behavior does require changed documentation (and changed tests) so this would be a requirement.

  1. It would be preferable if the docstrings in the tests could use the linking syntax, instead of the code literal syntax, when talking about methods. ie. C{child} becomes L{URLPath.child}.

Suggestion, but hopefully trivial enough that magmatt would want to do it :). (Also, for your information, L{child <URLPath.child>} will make the link appear as just the word "child" but still link to the right method.

  1. Please use assertEqual instead of assertEquals.

This should really come along with a citation. How did you know that this was correct?

  1. (This is mostly just commentary.) It's unfortunate that passing an unquoted URL to URLPath.fromString is broken, regardless of whether you pass quote=True; I realise it is at least documented this way but it sure looks weird (maybe fromString is a misleading name compared to something like fromURIBytes or fromUnquotedString):

"Mostly just commentary" is not an actionable statement :-). Should be in a separate "commentary" section.

  1. Since there is now a docstring for URLPath.__init__, could you document the other parameters too?

Doable as a separate ticket, since the parameters aren't new, right?

  1. I don't think there is a need to redundantly document the default value of quote in URLPath.__init__.

Definitely not a regression, so not necessary to deal with. Honestly, redundant documentation isn't such a bad thing. It's more work for the author but nice for the reader, since it gives them multiple angles to approach the documentation from.

  1. The docstring for URLPath itself should at least include a one-liner description of the class.

I am not so sure about this one.

  1. Some docstrings and parameters are missing periods at the end of their sentences.

Again, this is acceptable.

  1. Would you make sure there are two blank lines between methods where the code (or surrounding code) is being changed?

Despite being nitpicky, coding standard compliance is required (although if that's the only issue, the reviewer can make this change themselves).

(But this should also come with a reference citation...)

Sorry about all the nitpicking, I hope it doesn't deter you from getting this patch landed.

comment:41 Changed 12 months ago by magmatt

In IRC exarkun pointed out that danben's patch (2625.4.patch) is probably a better direction (adding new methods that do quoting instead of adding a _quote attribute and attempting to change existing functions).

Note: See TracTickets for help on using tickets.