[Twisted-Python] Problem fetching page with getPage

Terry Jones terry at jon.es
Sat Jan 2 09:34:09 EST 2010


I've run into a problem fetching an HTTP page with t.w.client.getPage. It's
not simple to make standalone code showing what's going wrong, but the
following summarizes where I am and why I find this puzzling.

After some setup, I have some a url path, and some headers I want to
send. A summary:

    host = 'ec2.amazon.com'
    port = 443
    path = '/?some=params&are=here&etc=etc'
    method = 'GET'
    data = ''
    headers = { 'some' : 'headers', 'Content-Length' : '0' }
    url = 'https://%s:%d%s' % (host, port, path)

the actual details don't matter right now, I don't think.  When I call

  d = getPage(url, headers=headers)

d's errback fires with a twisted.web.error.Error with a 403 status. So
you'd think I had something wrong in my headers, or was trying to access a
forbidden resource, etc.

But.... when I drop this code in instead of the call to getPage:

    import httplib
    cx = httplib.HTTPSConnection(host, port)
    cx.request(method, path, data, headers)
    response = cx.getresponse()
    print 'response status:', response.status
    body = response.read()
    print 'body:', body

I get a 200 status, and the body is exactly as expected.

BTW, the path above does start with a slash. I've tried using
HTTPClientFactory and reactor.connectSSL directly.  I've tried with and
without the '' postdata and Content-Length header. I've tried with Twisted
8.2.0 and 9.0.0.  And of course I've checked many times that the URL and
its query params requested by httplib and getPage are identical (apart from
the time-sensitive signature).

The reason it's not easy to provide a simple example is that the URL and
headers have signed components, based in part on a timestamp, and based in
part on Amazon secret keys, etc. It's not easy to separate all that, and if
I did I'd be posting at least 100 lines of code that would only run if you
had your Amazon AWS details provided etc.

In any case, it looks like the problem is not in the setup of the request.
Can anyone offer a reason why httplib might be able to fetch the page
whereas getPage receives an error?  I'm stumped.

Terry



More information about the Twisted-Python mailing list