[Twisted-Python] Re: Helps if I could type ... there is no reportlacb.com of course.
Glyph Lefkowitz
glyph at twistedmatrix.com
Fri Sep 6 19:06:11 EDT 2002
Slimming the recipient list here, since anyone interested in continuing
discussion should be on the twisted list anyway :)
On Fri, 6 Sep 2002 23:51:14 +0100, "Andy Robinson" <andy at reportlab.com> wrote:
> > The one thing I don't see PyRXP doing is this:
> > > * adhere to a subset of both DOM and SAX APIs for both event-based
> > > and synchronous processing of XML data
> Absolutely true. Our own need was to read in complete XML documents.
> ... AFAICT pyexpat does that pretty well for anyone who does not want
> validation. Or am I missing something?
I don't have a repeatable test case yet, but expat (at least pyexpat) seems to
segfault in some situations. I want to use XML as a data-transfer technique in
part as a stopgap measure because Python does not support robust parsing and
secure execution of Python code. If you can segfault the XML parser then it's
not much help :).
Also, turning off some of the features in pyexpat would sometimes hang or crash
the parser. Again, haven't hunted these down; but they were different in every
release of pyxml, and some are hard to repeat. Not worth the effort
considering how easy writing a new parser is.
> Our own feelings are to aim for simple Pythonic APIs rather than full SAX and
> DOM, which "feel like Java" to me and tend to be verbose.
Interesting idea. I may take this advice in my own XML-related endeavors.
I'll make sure to look at the idioms that pyRXP provides and see if I can
maintain some compatibility with them.
> > I want to do parsing and routing of XML such as jabber's "XML
> > streams", as a network protocol and not as a document parser.
> Our take on that was to make a small 'lazy cursor' using getattr to step into
> the tuple tree, so I can do expressions like
> "xml.invoice.customerDetails.addressLine1" and let it drill down for me.
That looks very cool! Taking this approach may solve some of the concerns I
had about XML-based persistence in Python. Thanks for the example :-). It does
look like you'd need validation in order to use a trick like that, though.
> ... under the hood the SAX one would be working on a ully parsed tree
> structure.
Yeah, that's a real problem for me, unfortunately.
> We would love to do more on pyRXP but are very short of time and have already
> met all our own requirements. Is this a candidate to become part of
> Twisted's XML toolkit, or specifically for Jabber? Is anyone else prepared
> to do a little on the pyRXP code with us and share some work?
I actually took an afternoon and wrote a Python XML parser that will probably
be included in the next Twisted release (I'm currently slapping on some ad-hoc
minidom-esque data structures to get existing twisted/xml utilities working).
The main interest I have is network protocols that speak XML (and determining
completeness of full XML documents received over a network connection), so it
looks like pyRXP is just not suitable for my requirements.
Again, considering how easy it is to write a new XML parser, and the end-user
unpleasantness associated with garnering new dependencies (pyRXP does not
appear to be packaged in even the latest Debian, for instance) I don't think
using pyRXP is worthwhile at the moment. Originally, I was hoping to avoid
writing my own parser. Now, I don't think that there's enough of a consensus
in the community about what's "good" for an XML parser to do that I can avoid
it :). Also, having done it, I'm much less interested in avoiding it.
When I am looking into more seriously high-performance applications involving
XML I think having RXP available as a backend would be really useful for some
operations, though, so I will eventually read about the tuple representation it
uses and try to provide compatibility at some layer.
--
| <`'> | Glyph Lefkowitz: Traveling Sorcerer |
| < _/ > | Lead Developer, the Twisted project |
| < ___/ > | http://www.twistedmatrix.com |
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
Url : http://twistedmatrix.com/pipermail/twisted-python/attachments/20020906/e1feaa7b/attachment.pgp
More information about the Twisted-Python
mailing list