Opened 10 years ago

Closed 10 years ago

#4786 release blocker: regression closed fixed (fixed)

twcgi duplicate header

Reported by: lvh Owned by: David Sturgis
Priority: normal Milestone:
Component: web Keywords:
Cc: Branch: branches/duplicate-headers-4786
branch-diff, diff-cov, branch-cov, buildbot
Author: tenth

Description

I'm using twcgi to serve a Fossil repository through CGI (it's that or xinetd). For some reason, Chrome wasn't rendering the CSS. Looking at the HTTP response headers for the GET request for the CSS file, I saw this:

HTTP/1.1 200 OK
Date: Sat, 01 Jan 2011 00:14:58 GMT
Content-Length: 13484
Expires: Sat, 8 Jan 2011 00:14:58 GMT
Content-Type: text/html
Content-Type: text/css; charset=utf-8
Server: TwistedWeb/10.2.0

I'm pretty sure duplicate Content-Type headers aren't supposed to happen.

After some digging, I'm convinced the Server, Date and the first Content-Type are the default values set in t.w.server [1]. Meanwhile, twcgi [2] users self.request.responseHeaders.addRawHeader, so the default header is never overwritten.

There is a related recent ticket, #4742, that actually broke this, but fixed a different way in which it was broken. That introduced the ability to add multiple headers by using addRawHeader. It's the right thing to do for cookie-setting headers, but it is possibly the wrong thing to do for Content-Type, since that should be in the response headers exactly once (probably other things should be in it exactly once as well).

Suggetions:

  1. Should a list of things that need setRawHeader be introduced?
  2. Should setRawHeader be the default, and instead have a list of things that need addRawHeader?
  3. Should that twcgi code use request.setHeader instead, like the t.w.server code?
  4. Should the default value setting in [1] be removed? (exarkun appears to feel this way and at first sight I agree, also this is not mutually exclusive with some of the earlier options)

If 4 applies, we can just keep using addRawHeader, so it's minimally invasive. It's by far the easiest, but not necessarily the best, option. Does anyone know why the text/html default is there in the first place?

If that's picked, either we:

  • remove the Server and/or Date defaults as well
  • document that CGI/WSGI things should not set Date/Server

[1] http://twistedmatrix.com/trac/browser/tags/releases/twisted-10.2.0/twisted/web/server.py#L118

[2] http://twistedmatrix.com/trac/browser/tags/releases/twisted-10.2.0/twisted/web/twcgi.py#L308

Change History (8)

comment:1 Changed 10 years ago by Michael Tharp

+1 on removing the default Content-Type. I've never seen any other CGI runner do that anyway.

Server and Date are mandatory so removing the defaults will produce "bad" responses with scripts that don't set them (that is, most scripts). Barring scripts from setting Date is also bad because it's important for caching. Instead just only set them if they're not already set by the script. That way the CGI can set however many of whatever headers it likes but there will never be a mix of default headers and script-supplied ones.

comment:2 Changed 10 years ago by Jean-Paul Calderone

Don't forget about CompatibilityPolicy.

comment:3 Changed 10 years ago by lvh

Duplicate Content-Type headers for perfectly ordinary CGI software sounds like a pretty obvious HTTP violation to me. The only other solution I can think of is make an exhaustive list of headers that aren't allowed to be duplicate, and filter them out right before the response headers are sent.

comment:4 Changed 10 years ago by Michael Tharp

Filtering is overkill and just means we now have a list of headers to maintain and argue over. A consistent policy of only providing defaults if no header was set is much simpler.

If we want to maintain backwards compatibility by continuing to set Content-Type, that's fine, the default-if-not-exists policy will still resolve the issue.

comment:5 Changed 10 years ago by David Sturgis

Author: tenth
Branch: branches/duplicate-headers-4786

(In [30465]) Branching to 'duplicate-headers-4786'

comment:6 Changed 10 years ago by David Sturgis

Keywords: review added
Owner: jknight deleted

With Itamar's help, we decided to add an option #5, to update twcgi to use the same behavior as wsgi, to remove any existing content-type entry before processing headers.

comment:7 Changed 10 years ago by Jean-Paul Calderone

Keywords: review removed
Owner: set to David Sturgis

Probably this part of the change should be reverted:

Property changes on: .
___________________________________________________________________
Added: svn:ignore
   + .pydevproject
.project

A couple other trivial things:

  • copyright date on modified files should be updated to include 2011
  • the bugfix news fragment should go in twisted/web/topfiles/

Merge when those are fixed!

comment:8 Changed 10 years ago by David Sturgis

Resolution: fixed
Status: newclosed

(In [30498]) Merged duplicate-headers-4786: CGI scripts no longer have extraneous content-type headers added.

Author: tenth Reviewer: exarkun Fixes: #4786

Updated twcgi.py to work similarly to wsgi.py, in that it now removes any existing content-type header before processing headers, and added a test to make sure that the "default" content-type header gets removed when the CGI script manually creates one.

Note: See TracTickets for help on using tickets.