[Twisted-Python] HTTPClientFactory's deferred never finishing download on .vcf (vcard file) link

Steve Steiner (listsin) listsin at integrateddevcorp.com
Thu Oct 8 17:31:37 EDT 2009


I'm attempting to get some web pages using the following code which I  
did not write.  While it seems to work (except for this, so far) I  
have no idea if this is a reasonable way to do this (get simple web  
pages) at all:

def getPage(url, contextFactory=None, *args, **kwargs):
     """
     Download a web page as a string.

     Download a page. Return a deferred, which will callback with a
     page (as a string) or errback with a description of the error.

     See HTTPClientFactory to see what extra args can be passed.
     """
     scheme, host, port, path = parse_url(url)
     factory = HTTPClientFactory(url, *args, **kwargs)
     if scheme == 'https':
         from twisted.internet import ssl
         if contextFactory is None:
             contextFactory = ssl.ClientContextFactory()
         reactor.connectSSL(host, port, factory, contextFactory)
     else:
         reactor.connectTCP(host, port, factory)

     return factory.deferred

The code then adds a bunch of callbacks to the returned deferred to do  
various things  to the data and everything's swell.

Until the url shown below occurs.  The deferred never calls any of the  
callbacks and just never seems to finish.

I haven't found any way to dump the actual headers from within Twisted  
as this occurs so the header values shown below are from firefox  
calling into the same URL.  I will put tcpdump in the way if I need to  
to figure this out but I'm thinking this is something simple (or wrong  
with the method used in the code above).

Can anyone tell me what it is about this particular transaction that's  
not allowing the deferred to fire its callbacks which I presume is  
because it never finishes getting the stuff it's looking for.  This  
particular URL returns a .vcf file.

Also, what is the proper intervention?  I'd like not to download  
the .vcf as it's completely useless for my purpose but I'm not  
familiar enough with twisted.web to know where to intervene.


Thanks,

S

http://www.integrateddevcorp.com/index.php?option=com_contact&task=vcard&contact_id=1&format=raw&tmpl=component

GET /index.php? 
option=com_contact&task=vcard&contact_id=1&format=raw&tmpl=component  
HTTP/1.1
Host: www.integrateddevcorp.com
User-Agent: Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.6; en-US; rv: 
1.9.1.3) Gecko/20090824 Firefox/3.5.3
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Accept-Language: en-us,en;q=0.5
Accept-Encoding: gzip,deflate
Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7
Keep-Alive: 300
Connection: keep-alive

HTTP/1.x 200 OK
Date: Thu, 08 Oct 2009 21:14:37 GMT
Server: Apache
X-Powered-By: PHP/5.2.8
Set-Cookie:  
ff70eb7218d444fa639af7ae7e66e82f=488606e54b7fdd9affb0b0725a2a6607;  
path=/
P3P: CP="NOI ADM DEV PSAi COM NAV OUR OTRo STP IND DEM"
Content-Disposition: attachment;  
filename=Integrated_Development_Corporation.vcf
Content-Length: 1020
Connection: close
Cache-Control: no-store, no-cache, must-revalidate, post-check=0, pre- 
check=0
Pragma: no-cache
Expires: Mon, 1 Jan 2001 00:00:00 GMT
Last-Modified: Thu, 08 Oct 2009 21:14:37 GMT
Content-Type: text/html; charset=utf-8
----------------------------------------------------------




More information about the Twisted-Python mailing list