Ticket #3384 (new enhancement)

Opened 2 years ago

Last modified 5 months ago

HTTPClientFactory should follow relative redirects

Reported by: adevore Owned by: adevore
Priority: normal Milestone:
Component: web Keywords: httpclient
Cc: exarkun Branch: branches/follow-relative-redirects-3384
Author: adevore Launchpad Bug:

Description

Some web applications respond with a faulty Location header that gives only a relative path (spam.html) instead of the full URL ( http://www.example.com/eggs/spam.html) that is required by HTTP. Twisted's HTTPClientFactory should follow these redirects.

Attachments

testcase.diff Download (2.5 KB) - added by adevore 2 years ago.
mypatch.diff Download (3.4 KB) - added by adevore 2 years ago.
Patch to web/client.py and a unit test case
mypatch2.diff Download (3.6 KB) - added by adevore 2 years ago.
Patch to web/client.py and a working unit test case
relative-redirect.diff Download (1.9 KB) - added by adevore 17 months ago.
relative-redirect-2.diff Download (2.2 KB) - added by adevore 17 months ago.

Change History

Changed 2 years ago by exarkun

  • cc exarkun added
  • owner changed from jknight to adevore

Thanks for the patch. Can you include a unit test for this change as well?

Changed 2 years ago by adevore

Changed 2 years ago by adevore

  • keywords review added

The current test case may need to be modified to apply to more real world situations.

Changed 2 years ago by adevore

Patch to web/client.py and a unit test case

Changed 2 years ago by exarkun

  • keywords review removed

There's one significant problem with the mypatch.diff, there are four errors in the test suite when it is applied:

===============================================================================
[ERROR]: twisted.web.test.test_webclient.WebClientSSLTestCase.test_infiniteRedirection

Traceback (most recent call last):
Failure: twisted.web.error.Error: 404 Not Found
===============================================================================
[ERROR]: twisted.web.test.test_webclient.WebClientSSLTestCase.test_infiniteRedirection

Traceback (most recent call last):
Failure: twisted.web.error.Error: 500 Internal Server Error
===============================================================================
[ERROR]: twisted.web.test.test_webclient.WebClientTestCase.testServerError

Traceback (most recent call last):
Failure: twisted.web.error.Error: 404 Not Found
===============================================================================
[ERROR]: twisted.web.test.test_webclient.WebClientTestCase.testServerError

Traceback (most recent call last):
Failure: twisted.web.error.Error: 500 Internal Server Error
-------------------------------------------------------------------------------

A few other lesser problems:

  • BrokenRedirect should have at least a class docstring (including documentation for its instance attribute path)
  • testRelativeRedirect should be named test_relativeRedirect
  • testRelativeRedirect needs a docstring
  • testRelativeRedirect returns a DeferredList. DeferredList always fires successfully, regardless of what the Deferreds it wraps do. This means failures in the test will be reported as errors as a consequence of garbage collection instead of due to a failing returned Deferred. You probably want to use gatherResults here instead, or split the test into two methods.
  • Instead of defining _cbRelativeRedirectNoStartingSlash and the other callback method, you can just add self.assertEqual (preferred over the spelling of self.assertEquals) as a callback to the Deferred you want to check the result of, with "success" as an extra callback argument. This is a bit easier to read since it's shorter and the code is nearer the rest of the Deferred setup.

Thanks

Changed 2 years ago by adevore

  • keywords review added

Changed 2 years ago by adevore

Patch to web/client.py and a working unit test case

Changed 2 years ago by therve

  • owner adevore deleted

Changed 2 years ago by exarkun

  • keywords review removed
  • owner set to adevore

Thanks.

A couple more comments:

  • BrokenRedirect causes request.finish to be called twice. It shouldn't call request.finish() itself, it can just return an empty string.
  • The check if basePath[-1] != '': isn't tested. There should be another new test method like the two you already added, but for the case where the resource issuing the redirect is at an URL which ends with /. This should exercise that check.
  • I think the new code that you added for changing parsed[2] is copying a bug which the existing code that changed that already had. What happens if the resource issuing the redirect is at an URL like http://foo/bar?baz=quux? Since this is a pre-existing and somewhat unrelated problem, it'd probably be best to open a new ticket for this problem (if it's real) and fix it separately from this change. However, it might be nice to avoid duplicating the bug in this new code. You could factor two assignments to parsed[2] into a single assignment that comes after the if/else suites.

Changed 2 years ago by exarkun

#3157 was a (very confused) duplicate of this.

Changed 21 months ago by exarkun

  • keywords httpclient added

#3378 was a duplicate of this.

Changed 20 months ago by exarkun

#3614 was a duplicate of this

Changed 17 months ago by adevore

Changed 17 months ago by adevore

  • keywords httpclient,review added; httpclient removed

I found that the difference is with redirects that are of the form:

original: /subdir/ location: foo.html

instead of the form

/subdir/foo.html

Changed 17 months ago by adevore

  • owner adevore deleted

Changed 17 months ago by exarkun

  • owner set to adevore
  • keywords httpclient added; httpclient,review removed

Thanks for investigating this and contributing this patch. :)

A couple small comments on the change:

  1. The new test method should have a docstring describing the scenario it is testing and the behavior which it is defining as correct.
  2. I think handleStatus_301 might be a better place for the new logic:
    1. If getPage is called with a relative URI with a relative path, setURL will also be called and the new logic invoked. Apparently this will have no effect (I was surprised to just find out that urljoin(None, "foo") == "foo" :/) but I'm not sure whether this is intentional behavior on the part of urljoin and whether we can rely on it continuing to behave this way.
    2. Either way, this isn't really the intended effect of the change - it's only intended to handle the redirect case, so putting it into the redirect handling method directly makes more sense to me.

Changed 17 months ago by adevore

Changed 17 months ago by adevore

  • keywords httpclient,review added; httpclient removed
  • owner adevore deleted

Changed 17 months ago by exarkun

  • branch set to branches/follow-relative-redirects-3384
  • branch_author set to exarkun

(In [26753]) Branching to 'follow-relative-redirects-3384'

Changed 17 months ago by exarkun

(In [26754]) Apply relative-redirect-2.diff

refs #3384

Changed 17 months ago by exarkun

  • owner set to adevore
  • keywords httpclient added; httpclient,review removed
  • branch_author changed from exarkun to adevore

Thanks. Just one more question. In the new patch, you added a check for not host protecting the join logic. I can't think of any way for not host to be true but path.startswith('/') to be false. If there's such a case, can you add a test for it?

I applied the patch to the branch mentioned above. If you can provide any subsequent patches against that branch, that'd be great.

Changed 16 months ago by adevore

path.startswith('/') is false when the server returns something like "Location: foo.html". In the case of "Location /subdir/foo.html" some logic in getURL already handled the translation.

Changed 16 months ago by exarkun

Right, but what about the host part?

Changed 5 months ago by alexis

If host is not "", it means that we have a full URL in the location header (like  http://foo.com/bar), so we don't need to modify the url variable.

I've made a similar ticket: #4380 I think the problem is fixed in the 3384 branch.

Note: See TracTickets for help on using tickets.