Opened 10 years ago

Closed 9 years ago

#2841 defect closed duplicate (duplicate)

[PATCH] Add a redirect limit to HTTPPageGetter in twisted/web/client.py

Reported by: alexis Owned by:
Priority: normal Milestone:
Component: web Keywords: redirect, http, client
Cc: Jean-Paul Calderone Branch:
Author:

Description

The HTTPPageGetter, when followRedirect is enabled, does not limit the number of redirections. It can stay freezed looping over and over again on a page that redirects to itself. Here is a little patch which lets you fix a limit for the number of redirections. If you used getPage(url, None, timeout=5) before, just use getPage(url, None, timeout=5, redirectLimit=10) now to limit the number of redirections to 10.

This patch also tries to handle bad server responses in lineRecieved() (Ticket #1010)

Attachments (1)

client.py.diff (2.3 KB) - added by alexis 10 years ago.

Download all attachments as: .zip

Change History (11)

comment:1 Changed 10 years ago by alexis

Sorry, there is a mistake in the last sentence: this patch tries to handle bad server responses in _parse(), not in lineRecieved() which is in web/http.py Sorry...

comment:2 Changed 10 years ago by alexis

Owner: changed from jknight to alexis
Status: newassigned

comment:3 Changed 10 years ago by alexis

Owner: alexis deleted
Status: assignednew

comment:4 Changed 10 years ago by Jean-Paul Calderone

Cc: Jean-Paul Calderone added

Can you also provide unit tests for this functionality and for the "bad server responses" handling?

comment:5 Changed 10 years ago by alexis

Ok, there is a simple example that would NOT work without the patch: it will be redirected until you decide to kill it: getPage("http://search.startium.com/star.srtium/?ver=4986") You can launch Wireshark to see it.

Now, if you patch client.py and if you add "print self.factory.redirectCount, url" just under the line 90 (self.factory.redirectCount+=1), you will see that the script will stop after X redirections with getPage("http://search.startium.com/star.srtium/?ver=4986", redirectLimit=X).

Here is the full script:


from twisted.web.client import getPage
from twisted.internet import reactor, defer

if name == "main":

def gotResult(a):

print a
reactor.stop()

def start():

page=getPage("http://search.startium.com/star.srtium/?ver=4986", redirectLimit=10)
page.addCallback(gotResult)
page.addErrback(gotResult)
print "let's go"

reactor.callWhenRunning(start)
reactor.run()


And for "bad server responses" I have no more example in mind, but sometimes the ":" is at the end of the host string, so the port is int("") which can not work.

I'll post unit tests tomorrow as I'm leaving now.

comment:6 Changed 10 years ago by alexis

Ok, the patch passes the unit tests. For testing the _parse() method, I would add this test in testParse(self) in the file test_webclient.py:

scheme, host, port, path = client._parse("http://bar:")
self.assertEquals(port, 80)
self.assertEquals(host, "bar")
self.assertEquals(path, "/")

Some "bad servers" redirect us to urls like that (with a ":" as last character)...

Changed 10 years ago by alexis

Attachment: client.py.diff added

comment:7 Changed 10 years ago by alexis

Sorry, but I just realized that this patch may not work as expected with redirections if you're getting many pages at the same time. I'll try to fix it but I think it is not usable as it is. Sorry for submitting too fast...

comment:8 Changed 10 years ago by alexis

Don't consider the last message, my tests were wrong. The patch works well.

comment:9 Changed 9 years ago by Jean-Paul Calderone

Resolution: duplicate
Status: newclosed

This was a duplicate of #2412

comment:10 Changed 7 years ago by <automation>

Note: See TracTickets for help on using tickets.