[Twisted-Python] Storing site-wide information and scheduling tasks

Andrew Bennetts andrew-twisted at puzzling.org
Thu Jun 26 08:43:31 EDT 2003


On Thu, Jun 26, 2003 at 01:31:18PM +0200, Thomas Weholt ( PRIVAT ) wrote:
> Andrew wrote:
> > On Thu, Jun 26, 2003 at 09:08:30AM +0200, Thomas Weholt ( PRIVAT ) wrote:

> > > In the same server I need to start tasks ( ie. run functions ) at
> > > different intervals. These function-calls can take some time ( they
> > > can fetch files from the net, scan the local filesystem etc. ) so I
> > > guess I need threads, deferred or something similar. These
> > > function-calls must return a result and update the persistent object
> > > mentioned above.
[...]
> >
> > For the "fetch files from the net", you don't need threads.  Just do
> > something like (warning -- untested code):
> >
[..snip code..]
> >
> >     reactor.callLater(refreshInterval, periodicFileFetch, url)
> >
> 
> Can I call reactor.callLater anywhere in my code, inside a running
> Webserver? Actually, I'll have a list of objects, each  having a execute
> method, and iterate thru the list for instance once a minute, call the
> objects-execute method and it will decide if it is configured to start once
> a minute, or once an hour. It's not a specified number of functions I want
> to call. I want users to be able to put a module into a folder, the folder
> will be scanned at startup of the webserver and the classes in that module
> will be created instances of and put in the mentioned list.

You can call reactor.callLater whenever you like.  It simply schedules a
function to be run by the reactor.  See:
    http://twistedmatrix.com/documents/howto/time

> Another question about getPage; I'll probably have a list of urls too. Will
> getPage work asynch or in serial-mode?

getPage works asynchronously, that's why it returns a Deferred.

If it was "serial", i.e. blocking, there wouldn't be much point returning a
Deferred -- it could just return the result immediately.

However, if the result of a function might take a while, but the function
wants to be asynchronous (so it doesn't hold up the rest of Twisted from
doing its stuff), it will return a Deferred -- i.e. a result that hasn't
arrived yet (hence the word "deferred" :).  It's really just a placeholder
for a result that's still to come.

> > For scanning the local filesystem, you could treat it like one big
> > blocking operation, or you could break it into small chunks (i.e. one
> > directory at a time), and process each chunk with callLater(0,
> > processNextChunk).  For the sake of discussion, I'm going to choose a
> > thread :)
> 
> Again, I'll have to look into callLater in the docs, but this will also
> just be one of many possible tasks the user has defined.

Note that like callLater isn't very special.  It simply says to the reactor
"in N seconds time, I want you to run this function".  The reactor will then
run that function in around N seconds time (possibly a little later, but not
sooner, iirc).  Like other events the reactor processes, like network
traffic, this all happens in the main thread.  So you can think of it as
being the same as if e.g. a dataReceived handler got triggered at the
appropriate time, and it called your function.  Anyway, read that document I
gave an URL to, and it should all become clear :)

Are you intending on pre-defining all the tasks the user might want to use,
or are you expecting users to want to plugin random code?

> >     # WARNING: More completely untested code.
[...]
> >
> > This is actually more-or-less what twisted.internet.threads.deferToThread
> > does (once you dig deep enough), so you probably want to use it rather
> > than my completely untested code.  I've written it out explicitly in the
> > hope that you'll have a better understanding of how it all works.

(I just want to re-emphasise that using deferToThread rather than my code is
probably a good idea)

> Thanks for the code, and even though I've commented it abit already I'll
> try to get a better look at it and test some things later tonight.
> Hopefully some of my comments has cleared things up too.
> 
> If anybody's interested this is what I'm trying to develop; a webserver
[..snip the usual omni-uber-server-that-will-take-over-the-world thing that
   Twisted does so well ;) ..]
> 
> One of the tasks I have to do is fetch RDF/RSS files for syndication. But
> I want to make a dynamic system where a user can just subclass/implement a
> specified class/interface and put his module into a specified folder and
> it will automatically be imported and run from the server.

Ok.  A couple of things...

First, Twisted already has a system for dealing with plugins (it's how mktap
works!), so you possibly want to re-use that :)   See twisted.python.plugins

Second, the two snippets of example code I've given you could easily be
adapted into a standard interface:

    from twisted.python.components import Interface

    class IUberServerPlugin(Interface):
        def startPlugin(self):
            """Called to start the plugin.

            This will typically be called when the program starts, similarly
            to e.g. twisted.internet.protocols.Factory.startFactory, or
            twisted.internet.app.ApplicationService.startService.
            """

        def stopPlugin(self):
            """Called to stop the plugin.

            Typically this will be called on shutdown, e.g. like stopFactory
            or stopService.
            """

        # ...etc...

So, my earlier code for getting a page could become something like:

    from twisted.web.client import getPage
    from twisted.internet import reactor
    from twisted.python import log

    from uberserver.interfaces import IUberServerPlugin   # ;)

    class PageFetchPlugin:
        __implements__ = (IUberServerPlugin,)
        
        def __init__(self, uberServer, url, refreshInterval=30):
            self.uberServer = uberServer
            self.url = url
            self.refreshInterval = refreshInterval

        def startPlugin(self):
            reactor.callLater(refreshInterval, self.periodicFileFetch)

        def periodicFileFetch(self):
            d = getPage(self.url)

            # Process the page, (e.g. to extract URLs, or something)
            d.addCallback(self.processPage)

            # Update the central uberServer
            d.addCallback(self.uberServer.updateObject)

            # Log any errors in downloading or processing
            d.addErrback(log.err)

            # Reschedule this function
            d.addBoth(reactor.callLater, self.refreshInterval, 
                      self.periodicFileFetch)

        def processPage(self):
            """Your code *still* goes here ;)"""


I hope I've given you some helpful ideas! :)

-Andrew.





More information about the Twisted-Python mailing list