[Twisted-Python] Problem fetching page with getPage

Terry Jones terry at jon.es
Sat Jan 2 14:14:12 MST 2010


>>>>> "Steve" == ssteinerX at gmail com <ssteinerx at gmail.com> writes:
Steve> On Jan 2, 2010, at 9:34 AM, Terry Jones wrote:
>> In any case, it looks like the problem is not in the setup of the request.
>> Can anyone offer a reason why httplib might be able to fetch the page
>> whereas getPage receives an error?  I'm stumped.
Steve> 
Steve> I've had to debug things like this recently and I have two suggestions:

Hi Steve

Thanks for the helpful reply - I can now make the call successfully.  The
difference turned out to be that httplib puts a Host: hostname:port header
into its calls, whereas getPage uses just Host: hostname. Plus there was
something else going on in some other code I'm using that made this a
problem (it was calculating a signature based on host:port).

Steve> 1> Recreate the headers and make it work with curl.  Curl won't add
Steve>    anything to your headers and such and you'll be sure that you're
Steve>    getting the result you want with completely stripped down case.

At least on my machine (curl 7.18.0 on Linux Ubuntu/Hardy) it adds a
User-agent, an Accept: */*, and also the Host header.

Steve> 2> Get Charles http://www.charlesproxy.com/ if you're on OS X.  It
Steve>    rocks.  Otherwise, get one of the Windows tools (sorry, no recos
Steve>    from me on that), and watch exactly what goes by.

It's available for Linux & Windows too. I tried it, but didn't make it work
fully when sending requests from the command line (with SSL, spoofing DNS,
etc). So in the end I just used netcat -l -p 443 and changed to HTTP to see
what was being sent. I wouldn't have thought of doing that without your
suggestion, so thanks a lot for the tip.

Terry




More information about the Twisted-Python mailing list