[Twisted-Python] web.client blowing up on non-fully qualified 301s
Keith Dutton
keith at Shopzilla.com
Thu May 25 18:48:28 EDT 2006
Hello,
I have run into an odd problem. I am not sure if it is my issue or Twisted: any help would be appreciated. Under at least some circumstances, twisted.web.client seems to 1) not be able to follow a 301, and 2) throw an unhandled exception, when trying to follow a 301. A specific example and resulting error is given below:
from twisted.internet import defer
from twisted.web import client
from twisted.internet import reactor
class HTTPGetter(client.HTTPClientFactory):
protocol = client.HTTPPageGetter
class Fetcher:
def __init__(self,client_factory = HTTPGetter):
self.factory = client_factory
def download(self,host,port,url):
f = self.factory(url)
f.deferred.addCallback(self.downloadFinished).addErrback(self.downloadFailed)
k = reactor.connectTCP(host, port, f, timeout=10)
return f.deferred
def downloadFinished(self,v):
print "good"
def downloadFailed(self, v):
print "bad"
print v
r = Fetcher()
w = r.download("www.shopzilla.com",80,"/aaaa")
reactor.callLater(10,reactor.stop)
reactor.run()
This results in:
Unhandled error in Deferred:
Traceback (most recent call last):
File "/usr/local/lib/python2.4/site-packages/twisted/internet/posixbase.py", line 226, in mainLoop
self.runUntilCurrent()
File "/usr/local/lib/python2.4/site-packages/twisted/internet/base.py", line 541, in runUntilCurrent
call.func(*call.args, **call.kw)
File "/usr/local/lib/python2.4/site-packages/twisted/internet/tcp.py", line 494, in resolveAddress
d.addCallbacks(self._setRealAddress, self.failIfNotConnected)
File "/usr/local/lib/python2.4/site-packages/twisted/internet/defer.py", line 182, in addCallbacks
self._runCallbacks()
--- <exception caught here> ---
File "/usr/local/lib/python2.4/site-packages/twisted/internet/defer.py", line 307, in _runCallbacks
self.result = callback(self.result, *args, **kw)
File "/usr/local/lib/python2.4/site-packages/twisted/internet/tcp.py", line 498, in _setRealAddress
self.doConnect()
File "/usr/local/lib/python2.4/site-packages/twisted/internet/tcp.py", line 520, in doConnect
connectResult = self.socket.connect_ex(self.realAddress)
File "<string>", line 1, in connect_ex
exceptions.TypeError: an integer is required
Which is apparently due to the fact that doConnect assumes a good address and so does not trap for TypeError.
The bad address that doConnect blows up on ('',None) for (host,port) slips in due to twisted.web.client.handleStatus_301. The example site (Shopzilla.com) posts a URL for the 301 Location that is not fully qualified. handleStatus_301, in the face of such a URL, appears to fail because it relies on getting the host/port from the location URL, but these are not present in it. Thus it passes in the ('',None) to its reactor.connectTCP attempt to follow the redirect, leading to the error above. My kludge fix to handleStatus_301 is given below, where if the host or port are missing I steal them from the transport, which should be correct since it was just used to get the page. I am running with this now, with no errors.
Is this a Twisted issue? If so, is my fix reasonable? If it is not a Twisted issue, what am I doing wrong?
Thanks,
Keith
def handleStatus_301(self):
l = self.headers.get('location')
if not l:
self.handleStatusDefault()
url = l[0]
if self.followRedirect:
scheme, host, port, path = \
_parse(url, defaultPort=self.transport.getPeer().port)
self.factory.setURL(url)
#following 4 lines added kad to fix apparent issue with 301 to a url that is not fully qualified
if self.factory.host == '':
self.factory.host = self.transport.addr[0]
if self.factory.port == None:
self.factory.port = self.transport.addr[1]
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://twistedmatrix.com/pipermail/twisted-python/attachments/20060525/f57a2e7b/attachment.htm
More information about the Twisted-Python
mailing list