[Twisted-Python] Re: Helps if I could type ... there is no reportlacb.com of course.

Glyph Lefkowitz glyph at twistedmatrix.com
Fri Sep 6 17:06:11 MDT 2002


Slimming the recipient list here, since anyone interested in continuing
discussion should be on the twisted list anyway :)

On Fri, 6 Sep 2002 23:51:14 +0100, "Andy Robinson" <andy at reportlab.com> wrote:
> > The one thing I don't see PyRXP doing is this:

> > >     * adhere to a subset of both DOM and SAX APIs for both event-based
> > >     and synchronous processing of XML data

> Absolutely true.  Our own need was to read in complete XML documents.
> ... AFAICT pyexpat does that pretty well for anyone who does not want
> validation.  Or am I missing something?

I don't have a repeatable test case yet, but expat (at least pyexpat) seems to
segfault in some situations.  I want to use XML as a data-transfer technique in
part as a stopgap measure because Python does not support robust parsing and
secure execution of Python code.  If you can segfault the XML parser then it's
not much help :).

Also, turning off some of the features in pyexpat would sometimes hang or crash
the parser.  Again, haven't hunted these down; but they were different in every
release of pyxml, and some are hard to repeat.  Not worth the effort
considering how easy writing a new parser is.

> Our own feelings are to aim for simple Pythonic APIs rather than full SAX and
> DOM, which "feel like Java" to me and tend to be verbose.

Interesting idea.  I may take this advice in my own XML-related endeavors.
I'll make sure to look at the idioms that pyRXP provides and see if I can
maintain some compatibility with them.

> > I want to do parsing and routing of XML such as jabber's "XML
> > streams", as a network protocol and not as a document parser.

> Our take on that was to make a small 'lazy cursor' using getattr to step into
> the tuple tree, so I can do expressions like
> "xml.invoice.customerDetails.addressLine1" and let it drill down for me.

That looks very cool!  Taking this approach may solve some of the concerns I
had about XML-based persistence in Python. Thanks for the example :-).  It does
look like you'd need validation in order to use a trick like that, though.

> ...  under the hood the SAX one would be working on a ully parsed tree
> structure.

Yeah, that's a real problem for me, unfortunately.

> We would love to do more on pyRXP but are very short of time and have already
> met all our own requirements.  Is this a candidate to become part of
> Twisted's XML toolkit, or specifically for Jabber?  Is anyone else prepared
> to do a little on the pyRXP code with us and share some work?

I actually took an afternoon and wrote a Python XML parser that will probably
be included in the next Twisted release (I'm currently slapping on some ad-hoc
minidom-esque data structures to get existing twisted/xml utilities working).
The main interest I have is network protocols that speak XML (and determining
completeness of full XML documents received over a network connection), so it
looks like pyRXP is just not suitable for my requirements.

Again, considering how easy it is to write a new XML parser, and the end-user
unpleasantness associated with garnering new dependencies (pyRXP does not
appear to be packaged in even the latest Debian, for instance) I don't think
using pyRXP is worthwhile at the moment.  Originally, I was hoping to avoid
writing my own parser.  Now, I don't think that there's enough of a consensus
in the community about what's "good" for an XML parser to do that I can avoid
it :).  Also, having done it, I'm much less interested in avoiding it.

When I am looking into more seriously high-performance applications involving
XML I think having RXP available as a backend would be really useful for some
operations, though, so I will eventually read about the tuple representation it
uses and try to provide compatibility at some layer.

-- 
 |    <`'>    |  Glyph Lefkowitz: Traveling Sorcerer   |
 |   < _/ >   |  Lead Developer,  the Twisted project  |
 |  < ___/ >  |      http://www.twistedmatrix.com      |
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
URL: </pipermail/twisted-python/attachments/20020906/e1feaa7b/attachment.sig>


More information about the Twisted-Python mailing list