Opened 11 years ago

Closed 9 years ago

#3857 defect closed fixed (fixed)

Twisted web client doesn't include port in the host header

Reported by: Vultaire Owned by: Jean-Paul Calderone
Priority: normal Milestone:
Component: web Keywords:
Cc: ivank, Jason Michalski Branch: branches/getpage-host-port-3857
branch-diff, diff-cov, branch-cov, buildbot
Author: armooo

Description (last modified by therve)

The host header must contain the port number if different from 80 or 443.

This one bit me when I relocated a WordPress blog from port 80 to port 8080 on a server, and was trying to use getPage to grab a page from the blog. (getPage indirectly calls HttpClient.)

The HttpClient protocol does not seem to properly specify the "host" header in HTTP requests, and because of doing so, is ending up in an infinite redirect loop.

Specifically, given the url "http://domu-XX-XX-XX-XX-XX-XX.compute-1.internal:8080/wordpress/", here's what happens:

sendCommand: GET /wordpress/ HTTP/1.0
sendHeader: Host: domu-XX-XX-XX-XX-XX-XX.compute-1.internal
sendHeader: User-Agent: Twisted PageGetter
endHeaders

lineReceived: HTTP/1.0 301 Moved Permanently
lineReceived: Date: Thu, 28 May 2009 04:47:19 GMT
lineReceived: Server: Apache/2.2.9 (Ubuntu) PHP/5.2.6-2ubuntu4.2 with Suhosin-Patch
lineReceived: X-Powered-By: PHP/5.2.6-2ubuntu4.2
lineReceived: X-Pingback: http://domU-XX-XX-XX-XX-XX-XX.compute-1.internal:8080/wordpress/xmlrpc.php
lineReceived: Location: http://domu-XX-XX-XX-XX-XX-XX.compute-1.internal:8080/wordpress/
lineReceived: Vary: Accept-Encoding
lineReceived: Content-Length: 0
lineReceived: Connection: close
lineReceived: Content-Type: text/html; charset=UTF-8
lineReceived: 

[...loops, raises exception, exits.]

Connecting via telnet to port 8080 and doing the same commands by hand generates the same response, of course.

However, if I tack ":8080" to the host header:

sendCommand: GET /wordpress/ HTTP/1.0
sendHeader: Host: domu-XX-XX-XX-XX-XX-XX.compute-1.internal:8080
sendHeader: User-Agent: Twisted PageGetter
endHeaders

lineReceived: HTTP/1.0 200 OK
lineReceived: Date: Thu, 28 May 2009 05:08:32 GMT
lineReceived: Server: Apache/2.2.9 (Ubuntu) PHP/5.2.6-2ubuntu4.2 with Suhosin-Patch
lineReceived: X-Powered-By: PHP/5.2.6-2ubuntu4.2
lineReceived: X-Pingback: http://domU-XX-XX-XX-XX-XX-XX.compute-1.internal:8080/wordpress/xmlrpc.php
lineReceived: Vary: Accept-Encoding
lineReceived: Connection: close
lineReceived: Content-Type: text/html; charset=UTF-8
lineReceived:

[output follows]

RFC 2616 does specify that the port should be specified on this header, else port 80 will be implied.

I know not all cases are affected; Apache seems to be pretty forgiving and I can retrieve static pages fine. But when I go for anything in WordPress, then I run into problems.

Attachments (1)

0001-Add-the-port-to-the-Host-header-if-it-is-not-the-def.patch (6.2 KB) - added by Jason Michalski 9 years ago.
HTTPPageGetter.connectionMade now sends the port in the host header if required.

Download all attachments as: .zip

Change History (18)

comment:1 Changed 11 years ago by Vultaire

Summary: twisted/web/http.py: HttpClient violates RFC 2616 when port != 80twisted/web/client.py: HttpPageGetter violates RFC 2616 when port != 80

I should clarify myself: The problem appears to be in the implementation of HttpPageGetter's connectionMade function.

Specifically, line 39 of client.py: self.sendHeader('Host', self.factory.headers.get("host", self.factory.host))

If host is not explicitly set, then it defaults to the host stored in the factory. However, if the port is not 80, it should also be attached in order to be RFC compliant.

comment:2 Changed 11 years ago by Jean-Paul Calderone

Component: coreweb
Owner: changed from Glyph to jknight

comment:3 Changed 11 years ago by ivank

If http:// and port 80, don't append :80. If https:// and port 443, don't append :443. Else, append :port

(I'm pretty sure this is what everyone else does.)

comment:4 in reply to:  3 Changed 11 years ago by Jean-Paul Calderone

Replying to ivank:

If http:// and port 80, don't append :80. If https:// and port 443, don't append :443. Else, append :port

(I'm pretty sure this is what everyone else does.)

Um? I guess you're describing a solution to whatever the problem here is. I don't really understand what the problem is though. Is it related to #3845 (another ticket I don't really understand)?

comment:5 Changed 11 years ago by ivank

Cc: ivank added

The problem is that twisted.web.client never appends a :port to the Host header, even when it is required to. It is required when the port is not the default port for the protocol.

Yes, this looks like a duplicate of #3845

comment:6 Changed 11 years ago by Jean-Paul Calderone

Resolution: duplicate
Status: newclosed

Okay, duplicate.

comment:7 Changed 9 years ago by <automation>

Owner: jknight deleted

comment:8 Changed 9 years ago by therve

Resolution: duplicate
Status: closedreopened

comment:9 Changed 9 years ago by therve

Description: modified (diff)
Summary: twisted/web/client.py: HttpPageGetter violates RFC 2616 when port != 80Twisted web client doesn't include port in the host header

comment:10 Changed 9 years ago by therve

Interestingly, the bug is not present in the Agent code.

Changed 9 years ago by Jason Michalski

HTTPPageGetter.connectionMade now sends the port in the host header if required.

comment:11 Changed 9 years ago by Jason Michalski

Cc: Jason Michalski added
Keywords: review added

comment:12 Changed 9 years ago by Jean-Paul Calderone

Owner: set to Jean-Paul Calderone
Status: reopenednew

comment:13 Changed 9 years ago by Jean-Paul Calderone

Author: exarkun
Branch: branches/getpage-host-port-3857

(In [30880]) Branching to 'getpage-host-port-3857'

comment:14 Changed 9 years ago by Jean-Paul Calderone

(In [30881]) Apply 0001-Add-the-port-to-the-Host-header-if-it-is-not-the-def.patch

refs #3857

comment:15 Changed 9 years ago by Jean-Paul Calderone

Author: exarkunarmooo
Keywords: review removed

Thanks for the patch! I want to make a few minor tweaks to it:

  1. Methods should be separated from each other by two blank lines
  2. Classes should be separated from each other by three blank lines
  3. There should be one more space in foo %(bar) - like foo % (bar)
  4. Rather than using FakeTransport, the new tests should use StringTransport. In fact, we should get rid of FakeTransport and change the one test method that uses it to use StringTransport instead (separate ticket). Better to have one good test double than lots of not very good ones.
  5. Instead of setting the transport attribute on a protocol and then calling its connectionMade method, you should call its makeConnection method with the transport and leave the rest up to it. This is what happens when a real connection is made.
  6. Indentation of subsequent lines of a multiline function call should line up with the opening ( of the call, or no args should be on the call line and all lines should be indented 4 spaces.

Nothing significant in this feedback, so I'll check in these changes and then merge the branch. Thanks again!

comment:16 Changed 9 years ago by Jean-Paul Calderone

(In [30882]) Tweaks, mostly to whitespace, but also add a docstring and use StringTransport and makeConnection

refs #3857

comment:17 Changed 9 years ago by Jean-Paul Calderone

Resolution: fixed
Status: newclosed

(In [30883]) Merge getpage-host-port-3857

Author: armooo Reviewer: exarkun Fixes: #3857

Make HTTPPageGetter send Host headers which include the port number of the server if it is not the default port.

Note: See TracTickets for help on using tickets.