[Twisted-Python] HTTPClientFactory's deferred never finishing download on .vcf (vcard file) link
Steve Steiner (listsin)
listsin at integrateddevcorp.com
Thu Oct 8 17:31:37 EDT 2009
I'm attempting to get some web pages using the following code which I
did not write. While it seems to work (except for this, so far) I
have no idea if this is a reasonable way to do this (get simple web
pages) at all:
def getPage(url, contextFactory=None, *args, **kwargs):
Download a web page as a string.
Download a page. Return a deferred, which will callback with a
page (as a string) or errback with a description of the error.
See HTTPClientFactory to see what extra args can be passed.
scheme, host, port, path = parse_url(url)
factory = HTTPClientFactory(url, *args, **kwargs)
if scheme == 'https':
from twisted.internet import ssl
if contextFactory is None:
contextFactory = ssl.ClientContextFactory()
reactor.connectSSL(host, port, factory, contextFactory)
reactor.connectTCP(host, port, factory)
The code then adds a bunch of callbacks to the returned deferred to do
various things to the data and everything's swell.
Until the url shown below occurs. The deferred never calls any of the
callbacks and just never seems to finish.
I haven't found any way to dump the actual headers from within Twisted
as this occurs so the header values shown below are from firefox
calling into the same URL. I will put tcpdump in the way if I need to
to figure this out but I'm thinking this is something simple (or wrong
with the method used in the code above).
Can anyone tell me what it is about this particular transaction that's
not allowing the deferred to fire its callbacks which I presume is
because it never finishes getting the stuff it's looking for. This
particular URL returns a .vcf file.
Also, what is the proper intervention? I'd like not to download
the .vcf as it's completely useless for my purpose but I'm not
familiar enough with twisted.web to know where to intervene.
User-Agent: Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.6; en-US; rv:
18.104.22.168) Gecko/20090824 Firefox/3.5.3
HTTP/1.x 200 OK
Date: Thu, 08 Oct 2009 21:14:37 GMT
P3P: CP="NOI ADM DEV PSAi COM NAV OUR OTRo STP IND DEM"
Cache-Control: no-store, no-cache, must-revalidate, post-check=0, pre-
Expires: Mon, 1 Jan 2001 00:00:00 GMT
Last-Modified: Thu, 08 Oct 2009 21:14:37 GMT
Content-Type: text/html; charset=utf-8
More information about the Twisted-Python