[Twisted-Python] regarding xml elements

glyph at divmod.com glyph at divmod.com
Sun Mar 30 01:08:15 EDT 2008


On 12:56 am, phil at bubblehouse.org wrote:
>On Mar 28, 2008, at 7:33 PM, Jean-Paul Calderone wrote:
>>On Fri, 28 Mar 2008 22:59:21 -0000, glyph at divmod.com wrote:
>>>On 02:55 pm, exarkun at divmod.com wrote:
>>>>On Fri, 28 Mar 2008 10:51:10 -0400, Phil Christensen 
>>>><phil at bubblehouse.org> wrote:

>>    >>> from twisted.web.microdom import parseString
>>    >>> s = '<div><span>hello</span> <span>world</span></div>'
>>    >>> parseString(s).toxml()
>>    '<?xml version="1.0"?><div><span>hello</span><span>world</span></ 
>>div>'
>>    >>>
>>So if you need such advanced XML features as correct whitespace 
>>handling,
>>steer clear. ;)

>I have to say, I don't find this to be that big an issue. I think if 
>you're using XML as a data interchange format (as I know the original 
>poster was), whitespace is generally syntactically meaningless.

Like many things in Microdom, whitespace handling does not strive to be 
particularly spec-compliant (the spec does say "An XML processor  MUST 
always pass all characters in a document that are not markup through to 
the application."), but to be useful for simple cases and stable enough 
that your code won't break.  If you want whitespace you can probably 
cram it in there.  For example, it has a creative misinterpretation of 
the "xml:space" attribute:
>>>from twisted.web.microdom import parseString
>>>s = '<div xml:space="preserve"><span>hello</span> 
>>><span>world</span></div>'
>>>parseString(s).toxml()
'<?xml version="1.0"?><div xml:space="preserve"><span>hello</span> 
<span>world</span></div>'

It is also hard-coded to preserve space in <pre> tags, which is also 
broken because it doesn't really honor namespaces, and therefore has no 
idea if your document is HTML or not, and it can't read DTDs so it 
doesn't know if your elements have this attribute set implicitly (and so 
on and so on).

This could be made into *slightly* less of a hack with a preserveSpace 
argument to parse*(), of course; the implementation would probably be 
very straightforward (c.f. MicroDOMParser.shouldPreserveSpace).  Maybe 
someone who actually likes Microdom, such as Phil, will add one, since 
all I'm committing to here is not totally hating it ;).




More information about the Twisted-Python mailing list