Opened 4 years ago

Closed 4 years ago

#8068 release blocker: regression closed fixed (fixed)

twisted.python.urlpath.URLPath doesn't respect the default path of '/'

Reported by: hawkowl Owned by: hawkowl
Priority: highest Milestone: Python-3.x
Component: core Keywords:
Cc: Branch: branches/urlpath-str-8068-3
branch-diff, diff-cov, branch-cov, buildbot
Author: hawkowl

Description (last modified by hawkowl)

In 15.4:

>>> URLPath.fromString("http://google.com")
URLPath(scheme='http', netloc='google.com', path='/', query='', fragment='')
>>> str(URLPath.fromString("http://google.com"))
'http://google.com/'

...in trunk...

>>> URLPath.fromString("http://google.com")
URLPath(scheme='http', netloc='google.com', path='', query='', fragment='')
>>> str(URLPath.fromString("http://google.com"))
'http://google.com'

Change History (15)

comment:1 Changed 4 years ago by hawkowl

Author: hawkowl
Branch: branches/urlpath-str-8068

(In [45993]) Branching to urlpath-str-8068.

comment:2 Changed 4 years ago by hawkowl

Branch: branches/urlpath-str-8068branches/urlpath-str-8068-2

(In [45999]) Branching to urlpath-str-8068-2.

comment:3 Changed 4 years ago by hawkowl

Keywords: review added

Builders spun, please review.

bee tee dubs, this makes the Treq testsuite pass on Python 3.

comment:4 Changed 4 years ago by Glyph

Keywords: review removed
Owner: set to hawkowl

From https://github.com/twisted/twisted/compare/trunk...urlpath-str-8068-2 -

"0 files changed"

probably need to fix that.

comment:5 Changed 4 years ago by posita

I'm not sure where to present this comment. My apologies in advance is there is a more appropriate place.

The URLPath.fromBytes implementation does not play well with future's `newbytes` type:

Python 2.7.10 (default, Sep 24 2015, 10:13:45)
[GCC 4.2.1 Compatible Apple LLVM 6.0 (clang-600.0.57)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> from __future__ import unicode_literals
>>> from twisted.python.urlpath import URLPath
>>> from future.types import newbytes, newstr
>>> u1 = URLPath.fromString(u'http://foo.com/')
>>> type(u1.scheme)
<type 'str'>
>>> u2 = URLPath.fromString(newstr(u'http://foo.com/'))
>>> type(u2.scheme)
<type 'str'>
>>> u3 = URLPath.fromBytes(b'http://foo.com/')
>>> type(u3.scheme)
<type 'str'>
>>> u4 = URLPath.fromBytes(newbytes(b'http://foo.com/'))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/.../lib/python2.7/site-packages/twisted/python/urlpath.py", line 131, in fromBytes
    return klass(*parts)
  File "/.../lib/python2.7/site-packages/twisted/python/urlpath.py", line 50, in __init__
    self._url = _URL.fromText(urltext.encode("ascii").decode("ascii"))
  File "/.../lib/python2.7/site-packages/future/types/newbytes.py", line 366, in __getattribute__
    raise AttributeError("encode method has been disabled in newbytes")
AttributeError: encode method has been disabled in newbytes

This is primarily because of this trick in the `URLPath` constructor:

        # ...
        urltext = urlquote(
            urlparse.urlunsplit((self.scheme, self.netloc,
                                 self.path, self.query, self.fragment)),
            safe=_allascii
        )
        self._url = _URL.fromText(urltext.encode("ascii").decode("ascii"))
        # ...

This assumes (takes advantage of the fact) that urltext (i.e., the returned value from urlquote(urlunplit(...))) is normally a "native" string type (i.e., a byte string in Python 2, but a Unicode string in Python 3). The Python 2 native string type (perversely) has both an encode method and a decode method. This is not the case with either Python 3 types (Unicode strings have an encode method, byte strings have a decode method), nor is it the case with future's newbytes:

>>> from twisted.python.compat import urllib_parse as urlparse, urlquote
>>> from twisted.python.urlpath import _allascii
>>> u5 = urlparse.urlunsplit(( newbytes(b'http'), newbytes(b'foo.com'), newbytes(b'/'), newbytes(b''), newbytes(b'') ))
>>> u5 ; type(u5)
b'http://foo.com/'
<class 'future.types.newbytes.newbytes'>
>>> u6 = urlquote(u5, safe=_allascii)
>>> u6 ; type(u6)
b'http://foo.com/'
<class 'future.types.newbytes.newbytes'>
>>> u6.encode(u'ascii').decode(u'ascii')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/.../lib/python2.7/site-packages/future/types/newbytes.py", line 366, in __getattribute__
    raise AttributeError("encode method has been disabled in newbytes")
AttributeError: encode method has been disabled in newbytes

The only reason I bring this up is future's apparent popularity in creating cross-compatible code (which is why I use it). I understand if supporting such a thing is beyond the scope of Twisted.

However, <str>.encode('ascii').decode('ascii') is a hack (and an artifact of a) Python 2's broken notion that you can encode a byte string or decode a unicode string; and b) trying to write one set of code that works for two distinct type of string types and two separate standard libraries). This can get you into trouble if you're not careful (see, e.g., #8044).

This can be worked around of course, but the work-around is not trivial (since newbytes works some ABC magic to appear as bytes in many cases):

>>> isinstance(b'', newbytes)
True
>>> isinstance(newbytes(b''), bytes)
True

One could do this:

class FutureCompatibleURLPath(URLPath):
    @classmethod
    def fromBytes(cls, url):
        try:
            native_url = url.__native__()
        except AttributeError:
            native_url = url
        return super(FutureCompatibleURLPath, cls).fromBytes(native_url)

Or simply:

class FutureCompatibleURLPath(URLPath):
    @classmethod
    def fromBytes(cls, url_bytes):
        return cls.fromString(url_bytes.decode('ascii'))

comment:6 Changed 4 years ago by posita

Actually, that does prompt me to ask, what about this as an alternate implementation?

class URLPath(object):
    # ...
    @classmethod
    def fromBytes(klass, url): # adopting existing naming conventions
        """
        ...
        """
        if not isinstance(url, bytes):
            raise ValueError("'url' must be bytes")
        return klass.fromString(url_bytes.decode('ascii'))
    # ...

It's possible with the above implementation that one would get a stack where fromString called fromBytes, which then called fromString again, but it should end there. Of course, one would have to be a little careful to avoid infinite recursion when overriding one or both of fromBytes and fromString.

comment:7 Changed 4 years ago by hawkowl

Keywords: review added
Owner: hawkowl deleted

Okay, it's actually had builders spun now, putting up for review.

RE: posita, we document that it takes a bytes, not python future's newbytes type (which is an entirely different type that is incompatible). We use the "encoding a bytestring" trick in a few places to ensure that it only contains encodable things -- where otherwise this check would go undone.

comment:8 Changed 4 years ago by posita

I get that. My point is that bytes are not encode-able. The fact that one can technically call encode on a byte string in Python 2 doesn't mean Twisted should. ;o)

Any thoughts on my suggestion in comment:6?

comment:9 Changed 4 years ago by Adi Roiban

Keywords: review removed
Owner: set to hawkowl

Changes looks good, but I think the conversion needs a in-code comment.

I find it confusing to have bytes... and then process them with urlparse without specifying an encoding ... and get encoded str out of that operation


The drive-by tests test_partialArguments might be ok... but I find it confusing. To me it looks like 2 tests test_withoutArguments and test_partialArguments

please see my comment and merge.

Thanks!

comment:10 Changed 4 years ago by hawkowl

Branch: branches/urlpath-str-8068-2branches/urlpath-str-8068-3

comment:11 Changed 4 years ago by hawkowl

Keywords: review added
Owner: hawkowl deleted

trunk moved in the meantime and the fix had to be rewritten. I wrote the tests, glyph provided the majority of the fix. Tests are looking ok. Please review.

comment:12 Changed 4 years ago by Adi Roiban

Is the ticket description and title still valid?

What was the problem and what is the expected result?

Thanks!

comment:13 Changed 4 years ago by hawkowl

Description: modified (diff)
Summary: twisted.python.urlpath has str attributes on Python 3, not bytestwisted.python.urlpath.URLPath doesn't respect the default path of '/'

comment:14 Changed 4 years ago by Adi Roiban

Keywords: review removed
Owner: set to hawkowl

OK. All good

I tried on this branch on both py2 and py3 and it looks good.

>>> from twisted.python.urlpath import URLPath
>>> str(URLPath.fromString("http://google.com"))
'http://google.com/'
>>> 

please merge.

Thanks!

comment:15 Changed 4 years ago by hawkowl

Resolution: fixed
Status: newclosed

(In [46176]) Merge urlpath-str-8068-3: Make twisted.python.urlpath.URLPath respect the default path of '/'

Author: hawkowl Reviewers: adiroiban Fixes: #8068

Note: See TracTickets for help on using tickets.