<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2//EN">
<HTML>
<HEAD>
<META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=iso-8859-1">
<META NAME="Generator" CONTENT="MS Exchange Server version 6.5.7638.1">
<TITLE>web.client blowing up on non-fully qualified 301s</TITLE>
</HEAD>
<BODY>
<!-- Converted from text/plain format -->
<P><FONT SIZE=2>Hello,<BR>
<BR>
I have run into an odd problem. I am not sure if it is my issue or Twisted: any help would be appreciated. Under at least some circumstances, twisted.web.client seems to 1) not be able to follow a 301, and 2) throw an unhandled exception, when trying to follow a 301. A specific example and resulting error is given below:<BR>
<BR>
<BR>
from twisted.internet import defer<BR>
from twisted.web import client<BR>
from twisted.internet import reactor<BR>
<BR>
class HTTPGetter(client.HTTPClientFactory):<BR>
protocol = client.HTTPPageGetter<BR>
<BR>
class Fetcher:<BR>
<BR>
def __init__(self,client_factory = HTTPGetter):<BR>
self.factory = client_factory<BR>
<BR>
def download(self,host,port,url):<BR>
f = self.factory(url)<BR>
f.deferred.addCallback(self.downloadFinished).addErrback(self.downloadFailed)<BR>
k = reactor.connectTCP(host, port, f, timeout=10)<BR>
return f.deferred<BR>
<BR>
def downloadFinished(self,v):<BR>
print "good"<BR>
<BR>
def downloadFailed(self, v):<BR>
print "bad"<BR>
print v<BR>
<BR>
r = Fetcher()<BR>
w = r.download("www.shopzilla.com",80,"/aaaa")<BR>
reactor.callLater(10,reactor.stop)<BR>
reactor.run()<BR>
<BR>
This results in:<BR>
<BR>
Unhandled error in Deferred:<BR>
Traceback (most recent call last):<BR>
File "/usr/local/lib/python2.4/site-packages/twisted/internet/posixbase.py", line 226, in mainLoop<BR>
self.runUntilCurrent()<BR>
File "/usr/local/lib/python2.4/site-packages/twisted/internet/base.py", line 541, in runUntilCurrent<BR>
call.func(*call.args, **call.kw)<BR>
File "/usr/local/lib/python2.4/site-packages/twisted/internet/tcp.py", line 494, in resolveAddress<BR>
d.addCallbacks(self._setRealAddress, self.failIfNotConnected)<BR>
File "/usr/local/lib/python2.4/site-packages/twisted/internet/defer.py", line 182, in addCallbacks<BR>
self._runCallbacks()<BR>
--- <exception caught here> ---<BR>
File "/usr/local/lib/python2.4/site-packages/twisted/internet/defer.py", line 307, in _runCallbacks<BR>
self.result = callback(self.result, *args, **kw)<BR>
File "/usr/local/lib/python2.4/site-packages/twisted/internet/tcp.py", line 498, in _setRealAddress<BR>
self.doConnect()<BR>
File "/usr/local/lib/python2.4/site-packages/twisted/internet/tcp.py", line 520, in doConnect<BR>
connectResult = self.socket.connect_ex(self.realAddress)<BR>
File "<string>", line 1, in connect_ex<BR>
<BR>
exceptions.TypeError: an integer is required<BR>
<BR>
Which is apparently due to the fact that doConnect assumes a good address and so does not trap for TypeError.<BR>
<BR>
The bad address that doConnect blows up on ('',None) for (host,port) slips in due to twisted.web.client.handleStatus_301. The example site (Shopzilla.com) posts a URL for the 301 Location that is not fully qualified. handleStatus_301, in the face of such a URL, appears to fail because it relies on getting the host/port from the location URL, but these are not present in it. Thus it passes in the ('',None) to its reactor.connectTCP attempt to follow the redirect, leading to the error above. My kludge fix to handleStatus_301 is given below, where if the host or port are missing I steal them from the transport, which should be correct since it was just used to get the page. I am running with this now, with no errors.<BR>
<BR>
Is this a Twisted issue? If so, is my fix reasonable? If it is not a Twisted issue, what am I doing wrong? <BR>
<BR>
Thanks,<BR>
<BR>
Keith<BR>
<BR>
def handleStatus_301(self):<BR>
l = self.headers.get('location')<BR>
if not l:<BR>
self.handleStatusDefault()<BR>
url = l[0]<BR>
if self.followRedirect:<BR>
scheme, host, port, path = \<BR>
_parse(url, defaultPort=self.transport.getPeer().port)<BR>
self.factory.setURL(url)<BR>
#following 4 lines added kad to fix apparent issue with 301 to a url that is not fully qualified<BR>
if self.factory.host == '':<BR>
self.factory.host = self.transport.addr[0]<BR>
if self.factory.port == None:<BR>
self.factory.port = self.transport.addr[1]<BR>
<BR>
<BR>
</FONT>
</P>
</BODY>
</HTML>