[Twisted-Python] Problem fetching page with getPage

Terry Jones terry at jon.es
Sat Jan 2 16:28:14 EST 2010


Hi Glyph

Thanks for the reply. I just sent another mail in the thread.

>>>>> "Glyph" == Glyph Lefkowitz <glyph at twistedmatrix.com> writes:
Glyph> Well, I know this isn't terribly helpful, but "a bug in getPage" is
Glyph> really the only thing that comes to mind.  Or, some
Glyph> legal-but-unusual behavior in getPage which triggers a bug on the
Glyph> EC2 side of things.

The error arose from a combination of things (signing a string that
included a host:port but then only sending a host in the Host header).
Turns out you can resolve it either way - using a port in both, or omitting
the port from both.


BTW, in reading about the Host header, it seems like getPage (more
specifically HTTPPageGetter) should be sending a port number in the header,
at least when the port is not 80. I base that remark on these:

  http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html#sec14.23
  http://www.w3.org/Protocols/rfc2616/rfc2616-sec3.html#sec3.2.2

That's a 1.1 spec as you surely know, and http.py sends an HTTP/1.0 header,
so you could argue that sending the Host is therefore just a nicety and
there's no need for a port. But the Host header isn't described in the HTTP
1.0 RFC, so it seems more like if you're going to send it you may as well
conform to HTTP 1.1.

But I guess that argument is somehow incorrect. I say that because a
comment in some other code I'm looking at that uses httplib, says that
prior to 2.6, httplib *used* to append a ":443" to SSL requests, but that
it no longer does. I guess sending the port was dropped from httplib for
good reason, and so HTTPPageGetter shouldn't add it. But I don't know.

I'm very far from being an expert on HTTP headers though. Not as far as I'd
like to be, though :-)

Thanks again for the reply.

Terry



More information about the Twisted-Python mailing list