[Twisted-Python] Problem fetching page with getPage
terry at jon.es
Sat Jan 2 16:14:12 EST 2010
>>>>> "Steve" == ssteinerX at gmail com <ssteinerx at gmail.com> writes:
Steve> On Jan 2, 2010, at 9:34 AM, Terry Jones wrote:
>> In any case, it looks like the problem is not in the setup of the request.
>> Can anyone offer a reason why httplib might be able to fetch the page
>> whereas getPage receives an error? I'm stumped.
Steve> I've had to debug things like this recently and I have two suggestions:
Thanks for the helpful reply - I can now make the call successfully. The
difference turned out to be that httplib puts a Host: hostname:port header
into its calls, whereas getPage uses just Host: hostname. Plus there was
something else going on in some other code I'm using that made this a
problem (it was calculating a signature based on host:port).
Steve> 1> Recreate the headers and make it work with curl. Curl won't add
Steve> anything to your headers and such and you'll be sure that you're
Steve> getting the result you want with completely stripped down case.
At least on my machine (curl 7.18.0 on Linux Ubuntu/Hardy) it adds a
User-agent, an Accept: */*, and also the Host header.
Steve> 2> Get Charles http://www.charlesproxy.com/ if you're on OS X. It
Steve> rocks. Otherwise, get one of the Windows tools (sorry, no recos
Steve> from me on that), and watch exactly what goes by.
It's available for Linux & Windows too. I tried it, but didn't make it work
fully when sending requests from the command line (with SSL, spoofing DNS,
etc). So in the end I just used netcat -l -p 443 and changed to HTTP to see
what was being sent. I wouldn't have thought of doing that without your
suggestion, so thanks a lot for the tip.
More information about the Twisted-Python