[Twisted-Python] Storing site-wide information and scheduling tasks

Thu Jun 26 07:31:18 EDT 2003

----- Original Message -----
From: "Andrew Bennetts" <andrew-twisted at puzzling.org>
To: <twisted-python at twistedmatrix.com>
Sent: Thursday, June 26, 2003 10:26 AM
Subject: Re: [Twisted-Python] Storing site-wide information and scheduling
tasks

> On Thu, Jun 26, 2003 at 09:08:30AM +0200, Thomas Weholt ( PRIVAT ) wrote:
> > I want to start a webserver which have object ( or whatever ) available
in
> > all Resource-instances running in that server which holds common
information
> > for the entire site. What's the appropriate module to use for this;
> > Application, Factory ??
>
> I'll let someone else answer this, but I think you want a Service.

I'll look at the API-reference, but I think Service is somewhat missing from
the docs. Or .. ???

>
> > In the same server I need to start tasks ( ie. run functions ) at
different
> > intervals. These function-calls can take some time ( they can fetch
files
> > from the net, scan the local filesystem etc. ) so I guess I need
threads,
> > deferred or something similar. These function-calls must return a result
and
> > update the persistent object mentioned above.
>
> (Btw, you seem to think Deferreds make blocking code magically
non-blocking.
> They don't -- they're actually very straightforward and non-magical.)
>
> For the "fetch files from the net", you don't need threads.  Just do
> something like (warning -- untested code):
>
>     from twisted.web.client import getPage
>     from twisted.internet import reactor
>     from twisted.python import log
>
>     refreshInterval = 30
>
>     def periodicFileFetch(url):
>         d = getPage(url)
>         # Process the page, and "update the persistent object"
>         d.addCallback(processPage).addCallback(updateObject)
>
>         # Log any errors in downloading or processing
>         d.addErrback(log.err)
>
>         # Reschedule this function
>         d.addBoth(reactor.callLater, refreshInterval, periodicFileFetch,
url)
>
>     def processPage(x):
>         "Your code goes here!"
>     def updateObject(x):
>         "Your code goes here!"
>
>     reactor.callLater(refreshInterval, periodicFileFetch, url)
>

Can I call reactor.callLater anywhere in my code, inside a running
Webserver? Actually, I'll have a list of objects, each  having a execute
method, and iterate thru the list for instance once a minute, call the
objects-execute method and it will decide if it is configured to start once
a minute, or once an hour. It's not a specified number of functions I want
to call. I want users to be able to put a module into a folder, the folder
will be scanned at startup of the webserver and the classes in that module
will be created instances of and put in the mentioned list.

Another question about getPage; I'll probably have a list of urls too. Will
getPage work asynch or in serial-mode?

> For scanning the local filesystem, you could treat it like one big
blocking
> operation, or you could break it into small chunks (i.e. one directory at
a
> time), and process each chunk with callLater(0, processNextChunk).  For
the
> sake of discussion, I'm going to choose a thread :)

Again, I'll have to look into callLater in the docs, but this will also just
be one of many possible tasks the user has defined.

>
> The trick with threads is to avoid interacting directly with *any* of your
> existing objects that your main event loop uses.  Disregarding this advice
> will lead to race conditions, and thus horrid, horrid bugs.  So, we scan
for
> files in a thread, but to make it safe we send the instruction to do work
to
> the thread via a Queue.Queue, and make sure it returns the results to the
> deferred via reactor.callFromThread.
>
>     # WARNING: More completely untested code.
>
>     from twisted.python import log, failure
>     from twisted.internet import reactor, defer
>     import Queue, threading
>
>     refreshInterval = 30
>
>     def processEvents(queue):
>         """A thread that processes events.
>
>         It receives (deferred, function) 2-tuples from a Queue.Queue, runs
>         the function, and fires the deferred with the result.
>         """
>         while 1:
>             try:
>                 deferred, func = queue.get()
>             except:
>                 log.err()
>                 continue
>             try:
>                 reactor.callFromThread(deferred.callback, func())
>             except Exception, e:
>                 reactor.callFromThread(deferred.errback,
failure.Failure(e))
>
>     def periodicFileScanner(queue, path):
>         # Tell the thread it's time to do some work
>         d = defer.Deferred()
>         q.put((d, lambda: scanFiles(path)))
>
>         # Arrange for the result/error to be dealt with
>         d.addCallback(updateObject)
>         d.addErrback(log.err)
>
>         # Schedule this fun merry-go-round to happen again
>         d.addBoth(reactor.callLater, refreshInterval, periodicFileScanner,
>                   queue, path)
>
>     def initFileScanning(path):
>         q = Queue.Queue()
>         t = threading.Thread(target=processEvents, args=(q,))
>         t.start()
>         reactor.callLater(refreshInterval, periodicFileScanner, q, path)
>
>     def scanFiles(path):
>         "Your code goes here!"
>     def updateObject(x):
>         "Your code goes here!"
>
>     initFileScanning('/path')
>
> This is actually more-or-less what twisted.internet.threads.deferToThread
> does (once you dig deep enough), so you probably want to use it rather
than
> my completely untested code.  I've written it out explicitly in the hope
> that you'll have a better understanding of how it all works.
>
> Note also how Deferreds are just messengers -- they don't do any
interesting
> work beyond calling callbacks when they're told to.
>
> > Can anybody show me a very basic example on how to do this?
>
> I hope I've my example code is basic enough that it makes sense for you --
> let us know if you're still uncertain about anything.

Thanks for the code, and even though I've commented it abit already I'll try
to get a better look at it and test some things later tonight. Hopefully
some of my comments has cleared things up too.

If anybody's interested this is what I'm trying to develop; a webserver for
blogging, image-galleries, messaging ( perhaps with interfaces to IRC and
jabber, hopefully ICQ too), news aggregator/syndication and distributed/p2p
content exchange ( not just filesharing, more like creating a virtual
cluster of webservers and get some communication going between them so they
can discover new nodes and query nodes for information ). It's a crossover
between Radio.Userland/MovableType and Gnutella ( I won't use the
gnutella-protocoll, just let the server use FOAF-files, which can contain
info about other nodes, which can provide a FOAF-file etc. and base
node-discovery on information in FOAF-files. ) HEP Messaging Server is
somewhat similar and has been one source of inspiration so far, both in
concept and code.

One of the tasks I have to do is fetch RDF/RSS files for syndication. But I
want to make a dynamic system where a user can just subclass/implement a
specified class/interface and put his module into a specified folder and it
will automatically be imported and run from the server.

Ok, this was a terrible mess of info, but hopefully it makes some sense. Any
comments, hints, ideas etc. would be very appreciated.

Thomas