Opened 9 years ago

Closed 9 years ago

Last modified 9 years ago

#6275 defect closed fixed (fixed)

twisted.web.template attributes have HTML injection vulnerability

Reported by: Shell Owned by: Glyph
Priority: highest Milestone:
Component: web Keywords:
Cc: jknight, zooko@… Branch: branches/template-sanitation-6275
branch-diff, diff-cov, branch-cov, buildbot
Author: exarkun

Description (last modified by Jean-Paul Calderone)

from twisted.web import template

s = ("""
<a xmlns:t="http://twistedmatrix.com/ns/twisted.web.template/0.1">
<t:attr name="href">" class="foobar</t:attr>
</a>
""")
class ExampleElement(template.Element):
    loader = template.XMLString(s)

def renderDone(output):
    print output

template.flattenString(
    None, ExampleElement()).addCallback(renderDone)

The result is:

<a href="" class="foobar"></a>

As you can see, the class attribute has been set through HTML injection in the href attribute. Dynamically inserted (through slots, Deferreds, or similar) attributes are also at risk.

The issue is that in twisted/web/_flatten.py, in the _flattenElement() function, there are several places where inAttribute is forced to False. The <t:attr> tag creates a transparent tag behind the scenes, and sets this tag as the relevant attribute. The contents of this tag are then rendered with inAttribute forced to False, as per line 150.

_flattenElement() should be fixed so that it can be called with any renderable items within an attributes. This includes never forcing inAttribute to false, and also sanitizing any statically created strings (such as '<' on line 153 and similar) when in an attribute. Doing so will cause it to be impossible to pass a malformed attribute through code error.

Change History (34)

comment:1 Changed 9 years ago by DefaultCC Plugin

Cc: jknight added

comment:2 Changed 9 years ago by Jean-Paul Calderone

Description: modified (diff)

Fixing the formatting.. maybe.

comment:3 Changed 9 years ago by Jean-Paul Calderone

Description: modified (diff)

This is a bit easier to read.

comment:4 Changed 9 years ago by Jean-Paul Calderone

Description: modified (diff)

This is easier still.

comment:5 Changed 9 years ago by Jean-Paul Calderone

Author: exarkun
Branch: branches/template-sanitation-6275

(In [36941]) Branching to 'template-sanitation-6275'

comment:6 Changed 9 years ago by Glyph

Build Results.

The twistedchecker failure is due to a bug in twistedchecker; the win32 one is intermittent, and the 3.3 and osx builders are dealing with configuration issues.

comment:7 Changed 9 years ago by Glyph

Keywords: review added

comment:8 Changed 9 years ago by Tom Prince

  • test_serializedAttributeWithTag says that re-parsing tags is the main use-case, but doesn't give any indication of why that is useful.
  • L{list}, etc. work

comment:9 in reply to:  8 Changed 9 years ago by Glyph

Replying to tom.prince:

  • test_serializedAttributeWithTag says that re-parsing tags is the main use-case, but doesn't give any indication of why that is useful.

It's useful because if the data isn't re-parseable in some way then it's just been corrupted or mangled?

  • L{list}, etc. work

Are you referring to a specific docstring that does C{list} or something?

comment:10 Changed 9 years ago by therve

Keywords: review removed
Owner: set to Jean-Paul Calderone
  1. I have some pydoctor failures in the test file if you care to fix them:
twisted.web.test.test_flatten.OrderedAttributes:28 invalid ref to Tag.attributes
twisted.web.test.test_flatten.OrderedAttributes:28 invalid ref to dict.items
twisted.web.test.test_flatten.OrderedAttributes.attributes:0 invalid ref to dict.items
twisted.web.test.test_flatten.OrderedAttributes.iteritems:42 invalid ref to dict.iteritems
twisted.web.test.test_flatten.TestSerialization.test_serializedAttributeWithTag:120 invalid ref to Tag
twisted.web.test.test_flatten.FlattenerErrorTests.test_tag:396 invalid ref to Tag
twisted.web.test.test_flatten.FlattenerErrorTests.test_tagWithoutLocation:412 invalid ref to Tag

2.

             yield _flattenElement(request, result, slotData, renderFactory,
-                    False)
+                                  inAttribute)

This change is not tested.

  1. There is one assertEquals to be replaced by assertEqual in the tests.

The code is a bit hard to follow (good lord, 10 conditionals in a function?), but the change looks sane to me. Please merge what at least 2. is fixed.

comment:11 Changed 9 years ago by Jean-Paul Calderone

Two more things:

  • inAttribute needs to be renamed
  • This branch should really resolve all injection attacks, not only the one mentioned by the reporter. So, reviewers take note. :/

comment:12 in reply to:  10 Changed 9 years ago by Glyph

Replying to therve:

  1. I have some pydoctor failures in the test file if you care to fix them:

I fixed these, but there seems to be a bug here in pydoctor. If I import a private alias of a public name in an implementation module, it ought to refer to the public name regardless. Instead, it considers from _foo import Bar ... "L{Bar}" a reference error, if Bar has been "moved" by being declared in some other module's __all__.

2.

             yield _flattenElement(request, result, slotData, renderFactory,
-                    False)
+                                  inAttribute)

This change is not tested.

I took the approach of replacing the inAttribute (now, dataEscaper) with the wrong value, and writing tests for each possible value. Thanks to this review point I think I'm actually confident that we now cover all the cases.

  1. There is one assertEquals to be replaced by assertEqual in the tests.

Done.

The code is a bit hard to follow (good lord, 10 conditionals in a function?),

Do you think it would be helpful to split up the function by some kind of dispatch, registering particular types for particular actions, rather than have one big function with a bunch of type-checks in it? I think that my dataEscaper and keepGoing changes make it quite a bit more readable, but it's definitely still a pretty nuanced piece of code.

but the change looks sane to me. Please merge what at least 2. is fixed.

exarkun's review seems to indicate that he thinks a re-review is necessary so I'll be resubmitting.

comment:13 in reply to:  11 Changed 9 years ago by Glyph

Replying to exarkun:

Two more things:

  • inAttribute needs to be renamed

I changed it to dataEncoder, changed its function to be more clear, wrote a TON of docstrings explaining why it's necessary and

  • This branch should really resolve all injection attacks, not only the one mentioned by the reporter. So, reviewers take note. :/

Test cases now exist for all the code paths through this mess which have different interactions with attribute serialization. I'm fairly sure that they are all both tested and working now.

comment:14 Changed 9 years ago by Glyph

Keywords: review added
Owner: Jean-Paul Calderone deleted
Priority: normalhighest

OK, there are now new build results up. Unfortunately the twistedchecker and pydoctor results that I'm getting for this branch are useless, and look wrong. Really we need to fix the buildbot to actually compare to the build for the revision of trunk where the branch was made so we don't keep getting all this irrelevant noise.

Other than that though I'm much more confident that this branch is really correct now. Somebody please take a look.

Since this is a potential security problem, I'm flagging it as "highest" priority, if anybody even looks at that field any more...

comment:15 Changed 9 years ago by Shell

I've taken a look at the code in branches/template-sanitation-6275 at glyph's request, although this isn't a formal review.

I've gone through several branches of code, both manually figuring out the code path and through an interpreter. I agree that this appears to fix all classes of XSS that don't involve injection into inline Javascript or CSS, although those are out of scope for this issue. It also doesn't appear to break backwards-compatibility. I'm happy to start using the refactored/fixed code ASAP.

Is there likely to be a minor release of Twisted 12.3 containing this security fix, or will it have to wait for the next full release?

comment:16 in reply to:  15 Changed 9 years ago by Glyph

Replying to Shell:

I've taken a look at the code in branches/template-sanitation-6275 at glyph's request, although this isn't a formal review.

I've gone through several branches of code, both manually figuring out the code path and through an interpreter. I agree that this appears to fix all classes of XSS that don't involve injection into inline Javascript or CSS, although those are out of scope for this issue. It also doesn't appear to break backwards-compatibility. I'm happy to start using the refactored/fixed code ASAP.

Thanks, we appreciate the feedback. This should at least give the official reviewer a little more peace of mind ;).

Is there likely to be a minor release of Twisted 12.3 containing this security fix, or will it have to wait for the next full release?

Well, let's not get ahead of ourselves - it needs to be merged to trunk first :-).

Personally I would like to see a 12.3.1 that fixes both this and the dns.Name regression. However, this depends on (A) whether any other users would like to see this and (B) whether anyone decides to step up and actually do the maintenance release :).

comment:17 Changed 9 years ago by Jean-Paul Calderone

Keywords: review removed
Owner: set to Glyph

Thanks for tackling this surprise issue.

  1. twisted/web/_flatten.py
    1. The flattenWithAttributeEscaping docstring talks about an example involving a tag y containing a tag x. It says that the metacharacters for C{y} should be passed through unchanged.
      1. I think metacharacters means the "<" and ">" which are used to represent the tag y, but I'm not sure this is standard vocabulary. If not, please be more explicit.
      2. I think what the string actually means to say is that the metacharacters for x should be passed through unchanged - since x is the tag which is contained by another tag. If this isn't the case, then I'm not sure I understand what this part of the docstring is trying to convey. This apparent error is repeated in the following paragraph as well.
    2. _flattenElement docstring still says C{str} instead of L{bytes} in the param docs for root
    3. It also says C{str} in its own return documentation
    4. C{str} also in _flattenTree docstring (3 times, I think)
    5. ... and flatten docstring
  2. twisted/web/test/test_flatten.py
    1. There's a bunch of C{builtin} that seems like they should be L{} instead
    2. I think that tests written like test_serializedAttributeWithSanitization are better written so that all variations call a helper function and so that the helper function is not itself a test method.
    3. test_serializedAttributeWithTransparentTag looks like it could be written in terms of the same helper test_serializedDeferredAttributeWithSanitization is written in terms of
    4. Nested imports in test_serializedAttributeWithTransparentTagWithRenderer should come out
    5. Single-quote docstrings in the class/methods defined within that test method look gross and violate the coding standard. Do these docstrings need to exist? I usually think no, as nothing can ever find them. Comments work just as well for conveying the information to a future maintainer. Either is fine with me, but I think normally formatted, multi-line, triple-quoted strings are needed if this information stays as docstrings.
    6. Also, twisted.web.template.TagLoader does all of the boringest parts of what the Loader defined in test_serializedAttributeWithTransparentTagWithRenderer does.
    7. Same comment about test/helper factoring for test_serializedAttributeWithTag
    8. in test_serializedAttributeWithTag, use of successResultOf will make backporting a bit more work. Here's a suggestion for an alternate approach for implementing this test without gross old-style Deferred hacking. Change assertFlattensImmediately (in _util.py) to return the result of flattening (aiui that would be the result local variable inside that method). Then use assertFlattensImmediately instead of successResultOf.
    9. test_serializedAttributeWithTagWithAttribute might be clearer if rewritten using XML, similar to how test_serializedAttributeWithTag is written. Right now, this is the only test method that does stuff with str.replace to duplicate the correct quoting logic, and it feels a bit sad to me. Also, this test looks kinda redundant with test_serializedAttributeWithTag now? Is it still testing something unique at all? If so, it should at least lose its self-doubting comment.
  3. twisted/test/testutils.py
    1. missing copyright header
    2. missing module docstring
    3. imports are all jumbled up
    4. assertXMLEqual is wrong, of course (at least I'm pretty sure it is, I doubt toxml guarantees anything about attribute order in its output? otoh I haven't used xml.etree much so perhaps I am wrong). Regardless, this seems to work well enough for now and it's a private helper, so not a blocker for this branch.
  4. needs a news fragment

Overall, looks pretty good. Still a complicated pile of code, but not really any more complicated than it was before - maybe even slightly less so. And better tested and so hopefully and apparently more correct! I'm happy for this to be merged after the above points are addressed, given an as-green-as-is-practical buildbot run.

comment:18 Changed 9 years ago by Glyph

(In [37000]) Re-word a few things to deal with review point 1.1. refs #6275

comment:19 Changed 9 years ago by Glyph

(In [37001]) Address review point 1.3.

refs #6275

comment:20 Changed 9 years ago by zooko

Cc: zooko@… added

comment:21 Changed 9 years ago by Glyph

(In [37002]) Address review point 1.4.

refs #6275

comment:22 Changed 9 years ago by Glyph

(In [37003]) Review point 1.5.

refs #6275

comment:23 Changed 9 years ago by Glyph

(In [37004]) Review point 2.1, as much as I can right now.

re #6275

comment:24 Changed 9 years ago by Glyph

(In [37005]) Review point 2.2.

refs #6275

comment:25 Changed 9 years ago by Glyph

(In [37007]) Address 2.5 and 2.6, and refactor even more stuff to use test helpers.

refs #6275

comment:26 Changed 9 years ago by Glyph

(In [37008]) Prepare to handle 2.8; enhance assertFlattensImmediately to have a return value.

refs #6275

comment:27 Changed 9 years ago by Glyph

(In [37009]) Review point 2.8.

refs #6275

comment:28 Changed 9 years ago by Glyph

(In [37010]) Address review point 2.9.

refs #6275

comment:29 Changed 9 years ago by Glyph

(In [37013]) NEWS fragment. Review point 3.4. Refs #6275

comment:30 Changed 9 years ago by Glyph

Replying to exarkun:

Thanks for tackling this surprise issue.

No problem. Oddly, it was a stimulating intellectual challenge. Very satisfying to write all the tests cases to make sure it was correct.

  1. twisted/web/_flatten.py
    1. The flattenWithAttributeEscaping docstring talks about an example involving a tag y containing a tag x. It says that the metacharacters for C{y} should be passed through unchanged.
      1. I think metacharacters means the "<" and ">" which are used to represent the tag y, but I'm not sure this is standard vocabulary. If not, please be more explicit.

I am pretty sure this is standard terminology. Also, escapeForContent, earlier in the file, already explains what is meant by this term.

  1. I think what the string actually means to say is that the metacharacters for x should be passed through unchanged - since x is the tag which is contained by another tag. If this isn't the case, then I'm not sure I understand what this part of the docstring is trying to convey. This apparent error is repeated in the following paragraph as well.

OK; adjusted.

  1. _flattenElement docstring still says C{str} instead of L{bytes} in the param docs for root

Fixed.

  1. It also says C{str} in its own return documentation

Fixed, and some more detail added, since really its @return is an @rtype.

  1. C{str} also in _flattenTree docstring (3 times, I think)

I didn't really change _flattenTree at all, but fine :-).

  1. ... and flatten docstring

Done.

  1. twisted/web/test/test_flatten.py
    1. There's a bunch of C{builtin} that seems like they should be L{} instead

Fixed, insofar as I can without introducing more pydoctor errors.

  1. I think that tests written like test_serializedAttributeWithSanitization are better written so that all variations call a helper function and so that the helper function is not itself a test method.

OK.

  1. test_serializedAttributeWithTransparentTag looks like it could be written in terms of the same helper test_serializedDeferredAttributeWithSanitization is written in terms of

Dealt with along with 2.2.

  1. Nested imports in test_serializedAttributeWithTransparentTagWithRenderer should come out

Done.

  1. Single-quote docstrings in the class/methods defined within that test method look gross and violate the coding standard. Do these docstrings need to exist? I usually think no, as nothing can ever find them. Comments work just as well for conveying the information to a future maintainer. Either is fine with me, but I think normally formatted, multi-line, triple-quoted strings are needed if this information stays as docstrings.

This was an attempt - misguided, I think - to pacify twistedchecker. I'll just live with the additional warnings, and maybe file a bug, since I think that the implementation is pretty obvious and inner functions and classes shouldn't need docstrings.

  1. Also, twisted.web.template.TagLoader does all of the boringest parts of what the Loader defined in test_serializedAttributeWithTransparentTagWithRenderer does.

Oh, I had forgotten that made it to trunk. Done.

  1. Same comment about test/helper factoring for test_serializedAttributeWithTag

Also done.

  1. in test_serializedAttributeWithTag, use of successResultOf will make backporting a bit more work. Here's a suggestion for an alternate approach for implementing this test without gross old-style Deferred hacking. Change assertFlattensImmediately (in _util.py) to return the result of flattening (aiui that would be the result local variable inside that method). Then use assertFlattensImmediately instead of successResultOf.

Good call, and thanks for bringing this up; I started off thinking this might not be important enough to backport, but after messing with it for a while I'm actually composing an email about that, and the need for that test utility was bugging me.

  1. test_serializedAttributeWithTagWithAttribute might be clearer if rewritten using XML, similar to how test_serializedAttributeWithTag is written. Right now, this is the only test method that does stuff with str.replace to duplicate the correct quoting logic, and it feels a bit sad to me.

Yeah, the redundant check on the confusing result string is a better way to deal with this than computing the confusing result string with a confusing replacement algorithm. Done.

Also, this test looks kinda redundant with test_serializedAttributeWithTag now? Is it still testing something unique at all? If so, it should at least lose its self-doubting comment.

It's testing the nesting of attribute->tag->attribute quoting, which is sensitive to changes in the implementation; test_serializedAttributeWithTag is testing just attribute->tag->body. I deleted the comment, though, the behavior's definitely right.

  1. twisted/test/testutils.py
    1. missing copyright header

Done.

  1. missing module docstring

Done; and, I added a helpful maintenance note in there, too.

  1. imports are all jumbled up

Done.

  1. assertXMLEqual is wrong, of course (at least I'm pretty sure it is, I doubt toxml guarantees anything about attribute order in its output?

I don't know if it guarantees anything about order, but the order looks pretty intentional to me:

>>> dom.parseString("<hello ca='4' cb='5' c='3' zzzzzzzzzz='last' a='1'         
b='2'/>").toxml()                                                               
u'<?xml version="1.0" ?><hello a="1" b="2" c="3" ca="4" cb="5"                  
zzzzzzzzzz="last"/>'                                                            

otoh I haven't used xml.etree much so perhaps I am wrong).

FWIW at least cElementTree's serialization looks similarly intentional:

>>> tostring(XML("<hello ca='4' cb='5' c='3' zzzzzzzzzz='last' a='1' b='2'/>")) 
'<hello a="1" b="2" c="3" ca="4" cb="5" zzzzzzzzzz="last" />'                   

And it's not just an accident with the data structure, because that enumerates in a different order:

>>> XML("<hello ca='4' cb='5' c='3' zzzzzzzzzz='last' a='1' b='2'/>").attrib    
{'a': '1', 'c': '3', 'b': '2', 'cb': '5', 'ca': '4', 'zzzzzzzzzz': 'last'}      

Regardless, this seems to work well enough for now and it's a private helper, so not a blocker for this branch.

It's a private helper which was previously independently implemented in the exact same way in two different test suites; I think that this change is a substantial improvement :). And, given that it seems to be intentionally sorting its attributes and producing deterministic output, I don't see much of a reason to change it - although I do worry a little bit that it might need to be using a more properly constructed parser, like the one twisted.lore.tree.parseFileAndReport builds. I think you might know that particular universe of awfulness a bit better than I do; can you take a look at it and figure out if there should be such a ticket? (parseFileAndReport needs to move out of lore anyway, since twisted.web.template._flatsaxParse totally duplicates it except not quite as nicely because _ToStan doesn't seem to quite have all the same error-reporting niceties).

  1. needs a news fragment

Done.

Overall, looks pretty good. Still a complicated pile of code, but not really any more complicated than it was before - maybe even slightly less so. And better tested and so hopefully and apparently more correct! I'm happy for this to be merged after the above points are addressed, given an as-green-as-is-practical buildbot run.

OK. I'm pretty comfortable with all the changes I just made, so I'll kick off some builds now and see if the buildbot tells me I missed anything. This final pass mostly just made the test suite more consistent and easier to read, but that will be important for future maintenance (which this needs - hello, template precompilation!) so it's time well spent.

Should I open a new issue for the backport? Do we have any process documentation on how those are supposed to work?

comment:31 Changed 9 years ago by davidsarah

Am I correct in thinking that this only affects templates with "t:attr" in them (where t refers to the "http://twistedmatrix.com/ns/twisted.web.template/0.1" namespace)?

comment:32 Changed 9 years ago by Glyph

Resolution: fixed
Status: newclosed

(In [37018]) Merge template-sanitation-6275

Author: glyph

Reviewer: therve, exarkun

Fixes: #6275

Fix a bug where attribute values are not quoted properly in twisted.web.template. Significantly improve test coverage for many previously- uncovered cases at the same time, as well as making the implementation somewhat easier to read and better documented for maintainers (if not necessarily making it much simpler...)

comment:33 in reply to:  31 Changed 9 years ago by Glyph

Replying to davidsarah:

Am I correct in thinking that this only affects templates with "t:attr" in them (where t refers to the "http://twistedmatrix.com/ns/twisted.web.template/0.1" namespace)?

That's the main practical concern. If you did something in your renderer like tag(something=a(href="something")) you might also get into trouble.

comment:34 Changed 9 years ago by Jean-Paul Calderone

Should I open a new issue for the backport? Do we have any process documentation on how those are supposed to work?

We don't have any process documentation for that. I guess you should think a little bit, write something down, and then begin executing it. Or if you want, maybe I or someone else can start on that.

Note: See TracTickets for help on using tickets.