[Twisted-Python] Can anyone recommend a sensible XML parser for Python?

Eron Lloyd elloyd at lancaster.lib.pa.us
Thu Sep 5 15:53:09 MDT 2002


Are you referring to PyXML? I know xml.* in the Standard Library is 
pretty weak by far (but getting better!). PyXML, on the other hand, 
supports currently at least two pretty powerful parsers: Expat ("the" 
parser for many projects, including mozilla), and xmlproc (a robust 
pure-python parser that does validation). In fact, I believe Fred Drake 
of PythonLabs is the maintainer of Expat, so Python will always have 
strong Expat support. Also, I know Daniel Veillard is very interested in
"guaranteeing" Python wrappers for the GNOME libxml/libxslt C library 
(http://www.xmlsoft.org/python.html). There are many more options I just
can't think of right now. All in all, there *is* a wealth of parsers 
available to you, you just have decide what you need. Check PyXML 
(http://pyxml.sf.net) and contact the Python XML-SIG for help. Have 
faith, Python is quickly shaping up to be a powerful XML platform. 

Cheers, 

Eron 

On Mon, 2002-09-02 at 05:07, Glyph Lefkowitz wrote: 
> 
> So, I'm really pretty discouraged and disgusted with the state of XML tools
> that ship with Python today.  Mainly, they do surprising and insecure things
> when I try to parse XML, and I don't understand how to tell what will and won't
> work between various versions of them.
> 
> I think my requirements of an XML parser are pretty simple.  Here are the
> basics of what I want it to do:
> 
> 
>     * adhere to a subset of both DOM and SAX APIs for both event-based and
>       synchronous processing of XML data
> 
>     * allow creation of DOM trees from fragments of an XML stream so that
>       discrete "packets" can be processed, a-la jabber "xml streams"
> 
>     * perform relatively well (optional)
> 
> More importantly, here are the things I *don't* want an XML parser to do:
> 
>     * validate in any way, ever, at all
> 
>     * fetch DTDs or otherwise do helpful things like eval()ing python code
>       found in random attributes in the node tree
> 
>     * break necessary extensions to SAX/DOM and subtleties of API compatibility
>       between versions, making my code do lots of checks
> 
>     * look for Unicode flag characters
> 
>     * pay attention to !DOCTYPE and ?xml directives
> 
>     * split Text nodes into multiple pieces on newlines or whitespace
> 
>     * pay attention specially to any attribute, like "xmlns"
> 
>     * dump core
> 
> Does anybody know of an XML parser that meets these requirements or am I going
> to have to write my own?
> 
> -- 
>  |    <`'>    |  Glyph Lefkowitz: Travelling Sorcerer  |
>  |   < _/ >   |  Lead Developer,  the Twisted project  |
>  |  < ___/ >  |      http://www.twistedmatrix.com      |
-- 
Eron Lloyd
Technology Coordinator
Lancaster County Library
elloyd at lancaster.lib.pa.us
Phone: 717-239-2116
Fax: 717-394-3083

---
[This E-mail scanned for viruses by Declude Virus]





More information about the Twisted-Python mailing list