[Twisted-Python] Should I use asynchronous programming in my own modules?

Thu Oct 18 10:50:42 EDT 2007

Itamar Shtull-Trauring wrote:
> [...]
> What you mean by "traditional" is actually a pull parser. Parsing APIs
> can be pull or push (i.e. asynchronous). Well-designed parsers are
> always push, because push parsers can be trivially converted to blocking
> pull parsers, but not vice-versa. Some examples of push/asynch parsers:
> twisted's Protocol class, or the SAX API.
Sorry, I think my example was somewhat misleading and it also becomes 
clear to me that I haven't used the word "asynchronous" correctly. I 
didn't consider that one can also register callbacks with a parser, for 
example, and call this type of programming asynchronous. (The principle 
"Don't call us, we call you" would apply here, too, of course.)

No, what I really meant by "traditional" was to write parsers and 
generators which traverse the document as a whole in one large step, 
without giving a chance to the twisted reactor to process any other 
events. Let's assume I got a dom tree from pythons XML parser. First, 
I'd traverse that tree and build up another tree consisting of element 
objects. Each element object is an instance of a class corresponding to 
a tag, for example for tag <chapter> I'd create a class "chapter". This 
is necessary because there's not always a one-to-one correspondence 
between tags and my document elements and to associate some additional 
attributes with such elements later, for example automatically generated 
chapter numbers. I'd then use the generator to traverse that element 
tree, calling "render_element" methods on my way. For element chapter 
with attribute title I'd call "render_chapter( node )", which then 
generates "<h1>chapter_title</h1>".

Let's assume I had some element with child elements. Without knowing 
about twisted at all, I'd have created a foreach loop to process each 
child like this:

foreach child_node in root_elem.children:
   if child_node.type = chapter:
      processChapter( child_node )

My idea now is that depending on the number of child elements, looping 
could take some time. So instead I'd use twisted's reactor, specifically 
its callLater method like this (it's only pseudo code!):

class Generator:

   def generate_html( self ):
      self.d = defer.Deferred()
      self.startProcessing()
      return self.d

   def startProcessing( self ):
      self.current_element = root_elem
      self.processNextElement()

   def processNextElement( self ):
      if more elements to process:
         if current_element.type = chapter
            reactor.callLater( 0, processChapter, current_element )
      .....
      else:
         d.callback( "finished" )

In this way any twisted user could get a Deferred from the generate_html 
method and get called when the Generator has generated all HTML. The 
problem with this is that I couldn't ever use such code without also 
installing twisted, of course.

It's more or less clear to me how to divide the traversal of such a dom 
tree into discrete steps, but it's not so clear how to call the 
processNextElement with reactor.callLater from the outside. Although, 
after I've read the other answers, it seems to me I'm not far from a 
solution. I think I could also create two classes: the Generator class, 
which would provide a processNextElement method and doesn't need to 
depend on the twisted framework, and a TwistedGenerator class, which 
would do exactly the same like the code above and repeatedly call 
processNextElement with reactor.callLater. But the internal housekeeping 
which element to process next could be more difficult than with the 
solution above, couldn't it? (Because instead of seperate methods like 
"processChapter", "processList", etc. I'd only have one method to call 
from outside, "processNextElement" (and something like 
"moreElementsToProcess"). The TwistedGenerator wrapper shouldn't know 
about the internal state of the Generator, I think.)

Many greetings,
Jürgen