[Twisted-web] twisted.web.template output encoding

exarkun at twistedmatrix.com exarkun at twistedmatrix.com
Sat Nov 26 11:52:10 EST 2011


Hello all,

I recently did some work on <http://twistedmatrix.com/trac/ticket/4896> 
("twisted.web.util.formatFailure vulnerable to XSS in rare cases"). 
Rather than apply the patch adding extra html.escape calls, I tried 
porting the function to twisted.web.template (presently it builds the 
html string manually).

Apart from various issues relating to the lack of patterns in 
twisted.web.template, the main difficulty is in handling non-ascii 
contents in the traceback.  Apart from any unicode that may show up in 
the source code being rendered (or, perhaps, eventually, the values of 
variables to be rendered - though for now I do not plan to implement 
this) the no-break space characters which are necessary to get traceback 
lines indented properly mean that there is always some non-ascii to 
include in the output.

twisted.web.template encodes its output using UTF-8, and this is not 
customizable.  Thus, using twisted.web.template, formatFailure's result 
will be a str containing UTF-8 encoded text.  Previously the result was 
a str containing only ASCII encoded text, with no-break space 
represented as `&nbsp;´.  Consequently, callers of `formatFailure´ will 
probably mishandle the result - the caller in `twisted.web.server´ does, 
at least, including the bytes in a page with a content type of 
"text/html".

The solutions that come to mind are all about removing this incompatible 
change and making it so `formatFailure´ can continue to return a str 
with ASCII-encoded text.

One solution is to add support for named entities or numeric character 
references to twisted.web.template.  Very likely this is a good idea 
regardless (Nevow supported these).

Another solution is to use a different encoding in 
`twisted.web.template´ - ASCII, with xmlcharrefreplace as the error 
handler.  This is tempting since it avoids an obtrusive non-ASCII 
support API (the way Nevow supports these is via `nevow.entities´, which 
must be used rather than normal Python unicode objects).

Perhaps another question is whether the encoding used by 
`twisted.web.template´ should be a parameter.  A related question raised 
might be whether `twisted.web.template´ should encoded to bytes at all, 
or delegate the responsibility for that to code closer to a socket.

As a work-around in `formatFailure´ I can decode the output of the 
flattener using UTF-8 and then re-encode it to avoid non-ASCII, but it 
seems like this should be solved in `twisted.web.template´ rather than 
over and over again in application code.

Jean-Paul



More information about the Twisted-web mailing list