[Twisted-Python] web.client blowing up on non-fully qualified 301s

Keith Dutton keith at Shopzilla.com
Thu May 25 18:48:28 EDT 2006


Hello, 

I have run into an odd problem.  I am not sure if it is my issue or Twisted: any help would be appreciated.  Under at least some circumstances, twisted.web.client seems to 1) not be able to follow a 301, and 2) throw an unhandled exception, when trying to follow a 301.  A specific example and resulting error is given below:


from twisted.internet import defer
from twisted.web import client
from twisted.internet import reactor

class HTTPGetter(client.HTTPClientFactory):
    protocol = client.HTTPPageGetter

class Fetcher:

    def __init__(self,client_factory = HTTPGetter):
        self.factory = client_factory
        
    def download(self,host,port,url):
        f = self.factory(url)
        f.deferred.addCallback(self.downloadFinished).addErrback(self.downloadFailed)
        k = reactor.connectTCP(host, port, f, timeout=10)
        return f.deferred
    
    def downloadFinished(self,v):
        print  "good"

    def downloadFailed(self, v):
        print "bad"
        print v
    
r = Fetcher()
w = r.download("www.shopzilla.com",80,"/aaaa")
reactor.callLater(10,reactor.stop)
reactor.run()

This results in:

Unhandled error in Deferred:
Traceback (most recent call last):
  File "/usr/local/lib/python2.4/site-packages/twisted/internet/posixbase.py", line 226, in mainLoop
    self.runUntilCurrent()
  File "/usr/local/lib/python2.4/site-packages/twisted/internet/base.py", line 541, in runUntilCurrent
    call.func(*call.args, **call.kw)
  File "/usr/local/lib/python2.4/site-packages/twisted/internet/tcp.py", line 494, in resolveAddress
    d.addCallbacks(self._setRealAddress, self.failIfNotConnected)
  File "/usr/local/lib/python2.4/site-packages/twisted/internet/defer.py", line 182, in addCallbacks
    self._runCallbacks()
--- <exception caught here> ---
  File "/usr/local/lib/python2.4/site-packages/twisted/internet/defer.py", line 307, in _runCallbacks
    self.result = callback(self.result, *args, **kw)
  File "/usr/local/lib/python2.4/site-packages/twisted/internet/tcp.py", line 498, in _setRealAddress
    self.doConnect()
  File "/usr/local/lib/python2.4/site-packages/twisted/internet/tcp.py", line 520, in doConnect
    connectResult = self.socket.connect_ex(self.realAddress)
  File "<string>", line 1, in connect_ex
    
exceptions.TypeError: an integer is required

Which is apparently due to the fact that doConnect assumes a good address and so does not trap for TypeError.

The bad address that doConnect blows up on ('',None) for (host,port) slips in due to twisted.web.client.handleStatus_301.   The example site (Shopzilla.com) posts a URL for the 301 Location that is not fully qualified.  handleStatus_301, in the face of such a URL, appears to fail because it relies on getting the host/port from the location URL, but these are not present in it.  Thus it passes in the ('',None) to its reactor.connectTCP attempt to follow the redirect, leading to the error above.  My kludge fix to handleStatus_301 is given below, where if the host or port are missing I steal them from the transport, which should be correct since it was just used to get the page.  I am running with this now, with no errors.

Is this a Twisted issue?  If so, is my fix reasonable?  If it is not a Twisted issue, what am I doing wrong?  

Thanks,

Keith

    def handleStatus_301(self):
        l = self.headers.get('location')
        if not l:
            self.handleStatusDefault()
        url = l[0]
        if self.followRedirect:
            scheme, host, port, path = \
                _parse(url, defaultPort=self.transport.getPeer().port)
            self.factory.setURL(url)
            #following 4 lines added kad to fix apparent issue with 301 to a url that is not fully qualified
            if self.factory.host == '':
                self.factory.host = self.transport.addr[0]
            if self.factory.port == None:
                self.factory.port = self.transport.addr[1]


-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://twistedmatrix.com/pipermail/twisted-python/attachments/20060525/f57a2e7b/attachment.htm 


More information about the Twisted-Python mailing list