[Twisted-Python] Should I use asynchronous programming in my own modules?

Thu Oct 18 08:41:38 EDT 2007

Hello,

I'm rather new to twisted and asynchronous programming in general. 
Overall, I think I've understood the asynchronous programming model and 
its implications quite well. Nevertheless, there are some remaining 
questions.

To give some example, I'd like to develop my own simplified document 
format in XML and a corresponding parser. The output of the parser (a 
specialized document object model) will be traversed and translated into 
HTML afterwards. This module could be useful outside any twisted 
application, of course. Instead of generating HTML one could develop a 
generator that produces LaTeX, for example. But it could also be used to 
render HTML pages in a twisted web application. The question is this: 
since parsing and generating large documents could block the reactor in 
a twisted app, should I use any of twisted's asynchronous programming 
features in this module (for better integration with twisted) or should 
I rather develop it in a traditional way and run it in a thread?

The question came to my mind, because somewhere I read that long lasting 
operations in third party modules should be called in a thread. This is 
clear. I also read that if one has the opportunity to develop an 
application from scratch, one should rather go for using twisted's 
asynchronous programming features and divide long lasting operations 
into small chunks. In principal, this approach is clear to me, but does 
it also apply for modules which are entirely independent from twisted 
networking code? And if so, is there any way to decouple them from the 
twisted library for reuse in other applications?

The last question is what criteria I could use to divide long lasting 
operations into chunks. In almost all books about asynchronous 
programming I only read that if they're too big, they could block the 
event loop. Of course, but how big is too big? And what's the measure 
for it? Milliseconds, number of operations, number of code lines - or 
what? Doesn't it depend entirely on the application at hand and how 
reactive it has to be? Moreover, depending on the hardware used, on a 
Pentium II less chunks can be processed at the same time than on a 
Athlon 64, for example. And couldn't chunks also be too small, spending 
more time than necessary in putting them into the reactor's queue, then 
maybe sorting them and then calling them? In case the overhead involved 
in scheduling some chunk is bigger than the processing time of the chunk 
itself, the chunks are too small, aren't they?

Thanks in advance for any answers,
Jürgen