[Twisted-Python] XML parsing on twisted

exarkun at twistedmatrix.com exarkun at twistedmatrix.com
Fri Oct 2 08:40:32 MDT 2009


On 1 Oct, 05:53 pm, burslem2001 at yahoo.com wrote:
>Hello,
>
>Probably a pretty standard question. However what are recommended 
>mechanics of parsing XML on twisted? I have a humongous string that 
>needs to be parsed and pushed into a database in the right columns.

Depending on how big the strings are, you may just want to parse them in 
the obvious way and then deal with the results.  If the strings are 
really epically big, then you have a few options.

You can handle them in another thread in the usual way. 
twisted.internet.threads.deferToThread gives you easy access to a 
threadpool which you can use for tasks like this.

You can hand them off to another process and deal with them there. 
Twisted has child process control built in, via reactor.spawnProcess. 
You may also find the Ampoule library (not part of Twisted) handy for 
this.

You can also do the XML parsing incrementally.  The Python standard 
library includes a SAX parser which you might want to use for this.  I 
think the newer APIs (eg etree) also support some forms of incremental 
parsing.  This should let you spread out the task of handling the XML 
over a longer period of time, thus avoiding blocking the reactor thread 
for unreasonable amounts of time.

Jean-Paul




More information about the Twisted-Python mailing list