[Twisted-web] twisted.web.template output encoding
exarkun at twistedmatrix.com
exarkun at twistedmatrix.com
Sat Nov 26 11:52:10 EST 2011
Hello all,
I recently did some work on <http://twistedmatrix.com/trac/ticket/4896>
("twisted.web.util.formatFailure vulnerable to XSS in rare cases").
Rather than apply the patch adding extra html.escape calls, I tried
porting the function to twisted.web.template (presently it builds the
html string manually).
Apart from various issues relating to the lack of patterns in
twisted.web.template, the main difficulty is in handling non-ascii
contents in the traceback. Apart from any unicode that may show up in
the source code being rendered (or, perhaps, eventually, the values of
variables to be rendered - though for now I do not plan to implement
this) the no-break space characters which are necessary to get traceback
lines indented properly mean that there is always some non-ascii to
include in the output.
twisted.web.template encodes its output using UTF-8, and this is not
customizable. Thus, using twisted.web.template, formatFailure's result
will be a str containing UTF-8 encoded text. Previously the result was
a str containing only ASCII encoded text, with no-break space
represented as ` ´. Consequently, callers of `formatFailure´ will
probably mishandle the result - the caller in `twisted.web.server´ does,
at least, including the bytes in a page with a content type of
"text/html".
The solutions that come to mind are all about removing this incompatible
change and making it so `formatFailure´ can continue to return a str
with ASCII-encoded text.
One solution is to add support for named entities or numeric character
references to twisted.web.template. Very likely this is a good idea
regardless (Nevow supported these).
Another solution is to use a different encoding in
`twisted.web.template´ - ASCII, with xmlcharrefreplace as the error
handler. This is tempting since it avoids an obtrusive non-ASCII
support API (the way Nevow supports these is via `nevow.entities´, which
must be used rather than normal Python unicode objects).
Perhaps another question is whether the encoding used by
`twisted.web.template´ should be a parameter. A related question raised
might be whether `twisted.web.template´ should encoded to bytes at all,
or delegate the responsibility for that to code closer to a socket.
As a work-around in `formatFailure´ I can decode the output of the
flattener using UTF-8 and then re-encode it to avoid non-ASCII, but it
seems like this should be solved in `twisted.web.template´ rather than
over and over again in application code.
Jean-Paul
More information about the Twisted-web
mailing list