Opened 10 years ago

Closed 9 years ago

#3157 defect closed duplicate (duplicate)

twisted.web.client regurgitates invalid URLs received in Location header when following redirects

Reported by: weijie90 Owned by:
Priority: normal Milestone:
Component: web Keywords: null byte, web server, broken, getPage
Cc: Branch:
Author:

Description

Twisted's twisted.web.client.getPage() has the same issue as http://bugs.python.org/issue2464 .

I quote:

"Instrumenting the code and looking closer at the tcpdump, its true. wikispaces.com is returning an invalid Location: header with a null byte in the middle of it.

The "fix" on our end should be to handle such garbage from such broken web servers more gracefully. Other clients seem to treat the null as an end of string or end of that header.

...

I'm not sure what the best solution for this is. If I truncate the header values at a \x00 character it ends in an infinite redirect loop (which urllib2 detects and raises on). If I simple remove all \x00 characters the resulting url is not accepted by wikispaces.com due to having an extra / in it.

Verdict: wikispaces.com is broken.

urllib2 could do better. wget and firefox deal with it properly. but i'll leave deciding which patch to use up to someone who cares about handling broken sites."

Change History (5)

comment:1 Changed 10 years ago by Jean-Paul Calderone

Summary: Twisted doesn't guess well what to do when encountering broken web serverstwisted.web.client regurgitates invalid URLs received in Location header when following redirects

Old summary was:

Twisted doesn't guess well what to do when encountering broken web servers

comment:2 Changed 10 years ago by Jean-Paul Calderone

In the case described above, the NUL byte received in the Location header's value is sent back to the server verbatim. I'm not sure what would be better to do. Perhaps something can be learned from httplib2.

comment:3 Changed 10 years ago by Jean-Paul Calderone

For comparison, Firefox 2 truncates at the NUL byte.

comment:4 Changed 9 years ago by Jean-Paul Calderone

Resolution: duplicate
Status: newclosed

The problem doesn't have anything to do with NUL, actually. It is really about the redirect location being a relative URL with a relative path, something not allowed in the location header. However, handling this might be a good idea anyway. That makes this ticket a duplicate of #3384.

comment:5 Changed 7 years ago by <automation>

Owner: jknight deleted
Note: See TracTickets for help on using tickets.