[Twisted-Python] Can anyone recommend a sensible XML parser for Python?

Glyph Lefkowitz glyph at twistedmatrix.com
Mon Sep 2 05:07:45 EDT 2002

So, I'm really pretty discouraged and disgusted with the state of XML tools
that ship with Python today.  Mainly, they do surprising and insecure things
when I try to parse XML, and I don't understand how to tell what will and won't
work between various versions of them.

I think my requirements of an XML parser are pretty simple.  Here are the
basics of what I want it to do:

    * adhere to a subset of both DOM and SAX APIs for both event-based and
      synchronous processing of XML data

    * allow creation of DOM trees from fragments of an XML stream so that
      discrete "packets" can be processed, a-la jabber "xml streams"

    * perform relatively well (optional)

More importantly, here are the things I *don't* want an XML parser to do:

    * validate in any way, ever, at all

    * fetch DTDs or otherwise do helpful things like eval()ing python code
      found in random attributes in the node tree

    * break necessary extensions to SAX/DOM and subtleties of API compatibility
      between versions, making my code do lots of checks

    * look for Unicode flag characters

    * pay attention to !DOCTYPE and ?xml directives

    * split Text nodes into multiple pieces on newlines or whitespace

    * pay attention specially to any attribute, like "xmlns"

    * dump core

Does anybody know of an XML parser that meets these requirements or am I going
to have to write my own?

 |    <`'>    |  Glyph Lefkowitz: Travelling Sorcerer  |
 |   < _/ >   |  Lead Developer,  the Twisted project  |
 |  < ___/ >  |      http://www.twistedmatrix.com      |
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
Url : http://twistedmatrix.com/pipermail/twisted-python/attachments/20020902/3b6dd8b1/attachment.pgp 

More information about the Twisted-Python mailing list