Opened 7 years ago

Closed 7 years ago

#2628 defect closed fixed (fixed)

twisted.web.client._parse (and any other urlparse-using code) is succeptible to cache poisoning

Reported by: exarkun Owned by:
Priority: highest Milestone:
Component: web Keywords:
Cc: Branch:
Author: Launchpad Bug:

Description

Consider:

exarkun@charm:~$ python
Python 2.4.3 (#2, Oct  6 2006, 07:52:30)
[GCC 4.0.3 (Ubuntu 4.0.3-1ubuntu5)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import urlparse
>>> urlparse.urlparse('http://hello/world')
('http', 'hello', '/world', '', '', '')
>>> urlparse.urlparse(u'http://hello/world')
('http', 'hello', '/world', '', '', '')
>>>
exarkun@charm:~$ python
Python 2.4.3 (#2, Oct  6 2006, 07:52:30)
[GCC 4.0.3 (Ubuntu 4.0.3-1ubuntu5)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import urlparse
>>> urlparse.urlparse(u'http://hello/world')
(u'http', u'hello', u'/world', '', '', '')
>>> urlparse.urlparse('http://hello/world')
(u'http', u'hello', u'/world', '', '', '')
>>>

One ought never to pass unicode strings to urlparse, most likely, but given that if any code in your process does it, your otherwise correct code might suffer the consequences, it is probably worth taking additional precautions around urlparse usage.

Change History (9)

comment:1 Changed 7 years ago by exarkun

  • Keywords review added
  • Owner changed from exarkun to glyph
  • Priority changed from normal to highest

comment:2 Changed 7 years ago by exarkun

_parse fixed in unicode-urls-2628

comment:3 Changed 7 years ago by glyph

  • Keywords review removed
  • Owner changed from glyph to exarkun

Generally an improvement, but bad input to http.urlparse will still poison the urlparse.urlparse cache. Shouldn't we detect this poisoning on both ends to minimize its impact on code that uses urlparse directly, i.e. raise an exception in response to unicode input?

Also, a minor point, _parse previously consistently used the <module>.<function> style for accessing both urlparse and urlsplit. Now it mixes that with the style that directly imports the function. I'd prefer it if there were at least local consistency on this, although given the inconsistency throughout the codebase and explicit inconsistency of the coding standard I don't have a strong preference for either one.

comment:4 Changed 7 years ago by exarkun

  • Keywords review added
  • Owner changed from exarkun to glyph

Okay it raises a TypeError now.

I left the imports alone because changing them would involve duplicate imports or changing a whole lot of stuff in that module.

comment:5 Changed 7 years ago by glyph

  • Keywords review removed
  • Owner changed from glyph to exarkun

OK, looks fine, except the docstring of the new urlparse needs a little help. First, it needs a @raise now, describing the TypeError. Second, it should describe why it exists and who should use it. Something like "this provides a compatible wrapper around Python's stdlib urlparse, but works around bugs related to unicode. All code which uses Twisted should prefer this function." Edit to taste, then merge.

comment:6 Changed 7 years ago by exarkun

  • Keywords review added
  • Owner changed from exarkun to glyph

Tests required some adjustment to account for the TypeError, so giving this back for one more look.

comment:7 Changed 7 years ago by glyph

  • Keywords review removed
  • Owner changed from glyph to exarkun

Okay. This looks fine. There's a test which fails for me now, but it also fails for me in trunk and I'm certain this branch had nothing to do with it. Merge.

comment:8 Changed 7 years ago by exarkun

  • Resolution set to fixed
  • Status changed from new to closed

(In [20244]) Merge unicode-urls-2628

Author: exarkun
Reviewer: glyph
Fixes #2628

Reject unicode input to twisted.web.client._parse, causing twisted.web.client.getPage
to fail on bad input earlier. Also check the result of urlparse carefully to ensure
it is not unicode and fix it if it is.

comment:9 Changed 4 years ago by <automation>

  • Owner exarkun deleted
Note: See TracTickets for help on using tickets.