[Twisted-web] internal vs. external hostname in Request.getHost9)

Glyph Lefkowitz glyph at twistedmatrix.com
Wed Mar 15 00:08:35 MDT 2017


> On Mar 14, 2017, at 3:00 PM, Ilya Skriblovsky <ilyaskriblovsky at gmail.com> wrote:
> 
> Tickets you have mentioned and forwarded-for-5807 branch are mostly about parsing X-Forwarded-For in order to obtain correct client IP. While it is valuable task, it is not what strikes me right now.

Sorry, I was pretty tired when I wrote that message, and I realize that I was getting server identification and client identification mixed up.

> I'm now more concerned with an absence of API for getting user-visible server's name, not client's ip.

Yes.  My mistake.  (Although totally fix those other bugs too. They're also bad. :))

> Look, I'm currently porting my app from Django to Klein and noticed strange behavior of Klein. For example:
> @app.route('/alias', alias=True)
> @app.route('/path')
> def path(request): return b'42'
> 
> When /alias is requested werkzeug generates a redirect to /path. But Klein is passing Request.getHost() to Werkzeug, so redirect gets internal hostname and exposes backend's internal hostname and port to the user. Seems like Klein is passing incorrect hostname to Werkzeug. But how can we fix that?
> 
> There are two methods in Request:
> • Request.getHost() — "Get my originally requesting transport's host" as doc says. Ok, seems like this method intentionally returns server's internal address.
> • Request.getRequestHostname() —doc says:
> >> "Get the hostname that the user passed in to the request. This will either use the Host: header (if it is available) or the host we are listening on if the header is unavailable."
> Cool, but why does this method only returns a hostname without a port? It intentionally strips out the port number from Host header. What is the point of such implementation?

Request is one of the oldest parts of Twisted, so the likely reason is "it looked like a good idea at the time".  Request predates the requirement for test coverage, documentation coverage, and, in many cases, the author (me) having any idea what they were doing :).  If you find something that looks bad, it's probably just bad, there is unlikely any deeper reason.

Long term, we need to overhaul the API to have fewer methods and be generally less confusing. See for example the infamous https://twistedmatrix.com/trac/ticket/288 ticket.  However, before we do that, we should make all the stuff that is there already behave correctly and be documented even in its weird shape; then we can transition to a new good thing confident in the knowledge that no old applications will break and that users can move over to the new APIs without massive disruption.

> This method is used only a couple of times inside Twisted itself, and in both places Twisted gets what getRequestHostname() returned and mixes it with request.getHost().port which is *definitely* incorrect, because the former is user-visible while latter is internal. So if my backend server is using different port than a fronend, it is impossible to use getRequestHostname() to build user-visible URL. I think current getRequestHostname() implementation is broken.
> 
> So I have two proposals:
> 
> Proposal #1 (fixing current behavior):
> • Variant #1: Change Request.getRequestHostname() to return b"hostname:port". I think this is the correct thing to do, but this is a backward-incompatible change.
> - or -
> • Variant #2: Change Klein to use Request.getHeader(b'Host') with fallback to Request.getHost()
> 
> Proposal #2 (adding new feature if Variant #1 is choosed):
> • Add useXForwardedHost=False argument to Request.getRequestHostname() and useXForwardedProto=False to Request.isSecure(). If True is passed, these methods will obey corresponding request headers that are de-facto standard for reverse proxies. Also add corresponding options to Klein app. This can simplify reverse proxy configuration a bit.

I have a third proposal.

Ideally if we want to know about the URL for the request, we could ask the request to just give us the URL.  And in fact the URL does have a method, URLPath(), which is both (A) unambiguously broken (the case could be made that getRequestHostname() is supposed to really just be a host, not for URL generation, and maybe there is even a case where that makes sense; origin comparisons perhaps) and (B) returning a data structure which could be fixed to be correct without concern for client compatibility.

In the long term, we should get rid of all these methods and have a single 'request.url()' method which cleanly and correctly returns a https://twistedmatrix.com/documents/17.1.0/api/twisted.python.url.URL.html object, which is better than a string or a URLPath (basically, it's what URLPath should have been if we had designed it carefully).  In the meanwhile, without adding a bunch of new API surface and abandoning existing methods, Request.URLPath() is the easiest place to put this fix.

getRequestHostname is, as you correctly called out, probably useless, but we should just adjust its docstring to direct users to the URLPath method instead.

Klein should then be changed to use Request.URLPath() to build any URLs.

What do you think of this proposal?  Does my reasoning make sense?

-glyph

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://twistedmatrix.com/pipermail/twisted-web/attachments/20170314/9de347e7/attachment-0001.html>


More information about the Twisted-web mailing list