Opened 7 years ago

Last modified 5 years ago

#4984 enhancement new

twisted.web.template needs passthrough technique.

Reported by: Stephen Thorne Owned by: Stephen Thorne
Priority: normal Milestone:
Component: web Keywords:
Cc: jknight Branch:


It is sometimes useful to do something along the lines of:

blob_of_rendered_html = someFunction()
return tags.html(tags.body(blob_of_rendered_html))

But it is not possible to get a str, or some kind of subclass of str, through _flattenString() or _flattenTree() without being modified.

Change History (7)

comment:1 Changed 7 years ago by DefaultCC Plugin

Cc: jknight added

comment:2 Changed 7 years ago by Jean-Paul Calderone

Owner: set to Stephen Thorne

Passing through arbitrary strings is a good way to generate invalid output, potentially with security issues.

If you really need to put some HTML (or XML or what have you) into the result, you can parse it into tags and then put the tag tree into the output and it will be reflattened for you.

I think we should consider this feature carefully and not just assume it's something to include.

comment:3 in reply to:  2 Changed 7 years ago by Glyph

Replying to exarkun:

If you really need to put some HTML (or XML or what have you) into the result, you can parse it into tags and then put the tag tree into the output and it will be reflattened for you.

Also, if we implement template precompilation, you only need to pay this parsing cost once (assuming you cache the result). So perhaps we should do that first, then revisit this if there are still use-cases for it unsatisfied by that performance improvement?

comment:4 Changed 5 years ago by Glyph

I've had a couple of discussions in the last few days about this, and it has come up on the mailing list again, so I find myself looking at it here.

One thing that occurs to me is that whether or not we add a way to pass through arbitrary garbage into the output document (which appears to be what exarkun is reacting to in his comment), we should have a way to pass through some HTML generated by another library or loaded from an external resource. This may involve hitting the slow path rather than dumping some bytes in the output, but right now there isn't a clear, documented way to get from random HTML content (trusted or not) to some twisted.web.template-renderable objects.

comment:5 Changed 5 years ago by Glyph

Here's a sample implementation of such a thing which uses html5lib (untested):

from html5lib import parseFragment
from xml.etree.cElementTree import tostring
from twisted.web.template import XMLString
def html2stan(htmlBytes):
    Convert HTML5 bytes to L{twisted.web.template} renderable objects.
    return XMLString(
        tostring(parseFragment(htmlBytes, treebuilder="etree")[0])

comment:6 Changed 5 years ago by Glyph

Or, if you're willing to sacrifice a little parsing correctness in edge cases in exchange for fewer dependencies, you could use microdom:

from twisted.web.template import XMLString
from twisted.web.microdom import parseString
def html2stan(htmlBytes):
    return XMLString(
        parseString(htmlBytes, beExtremelyLenient=True).toxml()

comment:7 Changed 5 years ago by Glyph

If you really, really want to corrupt your output stream (or you are really, really sure your input is valid), you also have this option:

from twisted.web.template import Element, renderer, flatten, XMLString as XML

class ElementWithRawJunk(Element):
    loader = XML("""
    <html xmlns:t="">
        <body><div><span>Hi.</span><div t:render="renderme"/></div></body>

    def __init__(self, write):
        super(ElementWithRawJunk, self).__init__()
        self.write = write

    def renderme(self, request, tag):
        self.write("\n>>>invalid stuff<<<\n")
        return ''

def flattenWithRawStuff(request, write):
    root = ElementWithRawJunk(write)
    flatten(request, root, write)

import sys

flattenWithRawStuff(None, sys.stdout.write)
Note: See TracTickets for help on using tickets.