[Twisted-Python] t.w.microdom and empty HTML elements

Alex Levy mesozoic at polynode.com
Fri Jun 27 14:28:50 MDT 2003


While looking into a problem where my pages were showing up blank on IE,
Moonfallen pointed me to the NEVERSINGLETON list in t.w.microdom.toxml().
There is a list of elements which should never be rendered as <node/>, and
instead should always be <node></node>.

Upon looking through the specs for XHTML at w3c.org, however, I found that
the _only_ XHTML elements which are allowed as singletons are those whose
content models are "EMPTY" -- <br/>, <hr/>, and so on.  The list of EMPTY
elements is not huge.

I understand the issue that microdom cannot get bogged down with HTML
tweaks; it is, after all, supposed to produce XML.  But it seems like
t.w.microdom shouldn't be spitting out invalid XHTML, even if (for now) it
doesn't look like it breaks anyone's browser.

As I understand it there is no real difference in XML between <node/> and
<node></node>.  Given that, wouldn't it be better for toxml() to use an
ALLOWSINGLETON list instead of NEVERSINGLETON?  That is, unless an element
is explicitly allowed to be <foo/>, it would be <foo></foo>?

If this is of interest to anyone, below is a diff that implements this; the
list of elements is from the XHTML 1.0 DTD.  I haven't tested it, so someone
else will have to do that, but it works on my page.


Index: twisted/web/microdom.py
===================================================================
RCS file: /cvs/Twisted/twisted/web/microdom.py,v
retrieving revision 1.93
diff -u -r1.93 microdom.py
--- twisted/web/microdom.py	27 Jun 2003 19:10:35 -0000	1.93
+++ twisted/web/microdom.py	27 Jun 2003 20:16:17 -0000
@@ -449,8 +449,9 @@
 
     def writexml(self, stream, indent='', addindent='', newl='', strip=0):
         # write beginning
-        NEVERSINGLETON = ('a', 'li', 'div', 'span', 'title', 'script',
-                          'textarea', 'select', 'style')
+        ALLOWSINGLETON = ('img', 'br', 'hr', 'base', 'meta', 'link', 'param',
+                          'area', 'input', 'col', 'basefont', 'isindex',
+                          'frame')
         # this should never be necessary unless people start
         # changing .tagName on the fly(?)
         if not self.preserveCase:
@@ -468,7 +469,7 @@
             for child in self.childNodes:
                 child.writexml(stream, newindent, addindent, newl, strip)
             w(j((newl, indent, "</", self.endTagName, '>')))
-        elif self.tagName.lower() in NEVERSINGLETON:
+        elif self.tagName.lower() not in ALLOWSINGLETON:
             w(j(('></', self.endTagName, '>')))
         else:
             w(" />")


-- 
Alex Levy
WWW: http://mesozoic.geecs.org

"Never let your sense of morals prevent you from doing what is right."
 -- Salvor Hardin, Isaac Asimov's _Foundation_




More information about the Twisted-Python mailing list