[Twisted-Python] newb: Twisted-Goodies asynch cluster and TaskQueue
general at eepatents.com
Thu Apr 26 15:26:56 EDT 2007
Ram Peters wrote:
> I have several log files to parse every hour. I am thinking of
> using Twisted-Goodies asynch cluster and TaskQueue. What I wanted to
> do is assign part of the file or a single file, in the taskqueue and
> set different client (node) to do the parsing. Number of logfiles
> and size of log files may increase in the future.
> My question is, are these right tool to use for this task? Is there
> any examples that uses asynch cluster and TaskQueue?
Well, I think it would be, but then I'm a bit biased, as the author!
My guess is that, unless the log files take more than a few minutes each
to parse, you'd be best off assigning the parsing of each file to its
own job. Each job goes into the queue for dispatching to the nodes as
they become available. You certainly could split the parsing of a file
into separate jobs if that makes sense, though.
To avoid having to read the file contents into memory for each job, you
might want to include a chunked-download PB referenceable object as an
argument to the job call. The node can do remote calls on the
referenceable that it receives to get the file data on a non-queued
"back channel" of sorts. (The same PB & TCP connection would be used,
but it would be independent of the task queue.)
I'm interested in seeing how others use asynCluster to distribute work
among different nodes, and would be willing to give you some help via
private email if you like. I have a chunked-download PB referenceable
that you could use, too. (It's pretty basic, just slightly modified from
what PB itself provides.)
More information about the Twisted-Python