[Twisted-Python] Application Design help - Concurrent but not Protocols based.

David Ripton dripton at ripton.net
Wed Jun 3 16:13:04 EDT 2009


On 2009.06.03 21:55:27 +0530, Senthil Kumaran wrote:
> 1) I need to constantly monitor a particular directory for new files.
> 2) Whenever a new file is dropped; I read that file and get
> information on where to collect data from that is a) another machine b)
> machine2-different method c) database.
> 3) I collect data from those machines and store it.
> 
> The data is huge and I need the three processes a, b, c to be
> non-blocking, and I can just do a function call like do_a(), do_b(),
> do_c() to perform them.
> 
> For 1) to constantly monitor a particular directory for new files, I
> am doing something like this:
> My Question: Can this be designed in way that looking for new files is
> also asynchronous activity? 

If your OS has a way to let you register your interest in particular
directories and then notify you when new files appear there, then yes.

If you're using Linux then check out the Twisted inotify wrapper that's
in dialtone's sandbox.  http://twistedmatrix.com/trac/changeset/25717

If you're using something else then it probably has a similar API but
it'll be more work because AFAIK nobody's already written the Twisted
wrapper for you.

Or maybe you can get away with just periodically calling os.listdir from
a subthread, using deferToThread.  Not technically asynchronous but
probably good enough.

> Now, after reading the contents, I will have to do a non-blocking call
> to fetch data, either using fun_a, fun_b or fun_b. How should I
> associate this requirement to deferred/callback pattern?

Depends.

If it's just a simple cheap Python function that doesn't block then you
can just do:

deferred1 = reactor.callLater(0, fun_a)
deferred1.addCallback(fun_a_callback)
deferred1.addErrback(fun_a_errback)

If it's a simple function that blocks and can't be changed to not block
but doesn't use too much CPU then you can use deferToThread.

If it's a piggy enough function that you really want it in a separate
process so it can use another CPU core, then write a little Python
script that wraps it, and call it using the Twisted process APIs:
http://twistedmatrix.com/projects/core/documentation/howto/process.html

But just because you can do this in Twisted doesn't mean you necessarily
should.  If you need an asynchronous main loop then Twisted has really
good APIs for dealing with asynchronous main loops.  (If you're on Linux
and can use inotify then it qualifies.)  But if you end up polling the
filesystem with os.listdir in one thread, and running your fun_x in
other threads, and you're not really doing anything asynchronous, then
IMO Twisted won't really add any value.  In that case I'd just use
Python's threading and Queue modules.

-- 
David Ripton    dripton at ripton.net




More information about the Twisted-Python mailing list