[Twisted-Python] One big smile ...
radix at twistedmatrix.com
Fri Jul 11 21:12:13 EDT 2003
On Sat, Jul 12, 2003 at 02:56:20AM +0200, Thomas Weholt wrote:
> I got a list of RSS/RDF-urls and I iterate thru them in sequence, feeding
> each one to Mark Pilgrims Ultra Liberal Parser. The call to that
> parse-method is actually what's being using in deferToThread. The I add a
> callback-method which processes the result of the RSS/RDF-parsing. It's a
> plain dictionary. This dictionary is processed into a simpler list. That's
> Alot of the things I'll be doing is like this; fetch a file from the net (
> here the parser fetches it itself ), process the data returned from that
> call and update some object or stuff the result into a database. The last
> option most of the times probably, and the database will most likely be
> SQLite which is not very thread-frinedly either. Other times it can be a
> XML-RPC/SOAP-call which might take time to complete and the result of the
> call might have to be processed. The last thing I can think of is scanning
> folders and generating checksums. Probably the worst job of them all. Not
> done often, but very time/resource-consuming when it occurs.
> Most often I'll have to update something in the main thread, some state or
> value, so if there's any way to use mutex/locks/whatever or ways to avoid
> the whole thing I'm all ears ( one of the worlds most moronic statements
> BTW ).
The way to avoid the problems is to _avoid threads whenever you
can_. For database access, Twisted already has an asynchronous
interface in twisted.enterprise. It uses threads, but in an isolated
way. Fetching files from the net and parsing data is possible to do
asynchronously already (see twisted.web.client). Twisted has an asynch
client interface to XMLRPC, and a SOAP one should be easy enough to
implement (see how the XMLRPC client interface was done). The
file-walking stuff could probably be done well just by breaking up the
process into steps, allowing the reactor to do the work it needs to do
in between steps (i.e., reactor.callLater(0, doSomeWork), and at the
end of doSomeWork, reschedule doSomeWork, until all work is done).
Does your RSS parser actually try to download data itself? If it does,
I would recommend trying to figure out how to get at the lower level
parsing bits; download the data with twisted.web.client, and then pass
the data to the parsing bits. All asynchronous, no threads (with their
data corruption and deadlocks) to worry about.
The big idea I'm trying to get through is that you should not be
defaulting to threads. Most of your use cases are possible to do
without them; the only time that threads are really required is when
you are trying to use a blocking interface that you don't want to or
can't plausibly rework.
Twisted | Christopher Armstrong: International Man of Twistery
Radix | Release Manager, Twisted Project
More information about the Twisted-Python