[Twisted-web] Re: Twisted-web Digest, Vol 15, Issue 20

Richard Meraz rfmeraz at gmail.com
Thu Jun 23 12:35:15 MDT 2005


Thanks Dave: very clear and easy to follow answer and example code.  I
definitely appreciate your time.

Final question.  Is there a convenient way to put an upper-bound on
how long twisted.web.client.getPage is allowed to complete its work. I
know twisted.web.client.getPage takes a timeout parameter, but this
seems more like a socket timeout which won't kill for example a
getPage waiting on a low-bandwidth server (is my understanding
correct?)

 For example, if I'm using the asyncore.py framework to mange IO i can
use the channel.timestamp attribute to examine how long things have
been going in order to kill long-running IO in a polling loop.  Of
course with asyncore I can manage my own polling loop which I can't
see an easy way to do using reactor() (someone care to comment on that
since I'm probably missing something).

-Thanks again

-Richard Meraz


On 6/23/05, twisted-web-request at twistedmatrix.com
<twisted-web-request at twistedmatrix.com> wrote:
> Send Twisted-web mailing list submissions to
>         twisted-web at twistedmatrix.com
> 
> To subscribe or unsubscribe via the World Wide Web, visit
>         http://twistedmatrix.com/cgi-bin/mailman/listinfo/twisted-web
> or, via email, send a message with subject or body 'help' to
>         twisted-web-request at twistedmatrix.com
> 
> You can reach the person managing the list at
>         twisted-web-owner at twistedmatrix.com
> 
> When replying, please edit your Subject line so it is more specific
> than "Re: Contents of Twisted-web digest..."
> 
> 
> Today's Topics:
> 
>    1. Re: Defers, the reactor,  and idiomatic/proper usage      -- new
>       user needs some advice? (Dave Gray)
> 
> 
> ----------------------------------------------------------------------
> 
> Message: 1
> Date: Thu, 23 Jun 2005 12:18:06 -0400
> From: Dave Gray <dgray at omniti.com>
> Subject: Re: [Twisted-web] Defers, the reactor, and idiomatic/proper
>         usage   -- new user needs some advice?
> To: Richard Meraz <rfmeraz at gmail.com>, "Discussion of twisted.web,
>         Nevow,  and Woven" <twisted-web at twistedmatrix.com>
> Message-ID: <42BAE0BE.1040209 at omniti.com>
> Content-Type: text/plain; charset="windows-1252"
> 
> I'm not familiar with feedlib, etc, but I'll answer what I can.
> 
> Richard Meraz wrote:
> > MAXTIME = 60 # Kill crawl after this time
> > TIMEOUT = 20 # Kill page retrieval after this time inactive
> > MAXDEPTH = 3 # Recurse this depth when crawling page.
> >
> > # Question: There seem to be many idioms to aggregate information from
> > different defered call-back chains in twisted..  Since everything runs
> > in a single thread I just stuck my stuff in a global class and everybody
> > modifies the vars there as I pass it around to the call-backs that
> > should see it.  Seems okay for a small script like this?
> 
> That seems fine, yeah. I think I would pass around the StateVars
> instance as a context if I were coding this. Probably the same effect.
> 
> > class StateVars:
> >     '''Keep Global state for starting/stopping feedfinding and a record
> > of links we have checked and their status'''
> >     connections = 1
> >     links_checked = {} # Structure: {url: (is RSS/ATOM/RDF, page-content)}
> >
> > # Question: start_feed_crawl is where I set up my defers.  getPage
> > returns a defer and I attach my call-back process_link.
> > # addCallbacks adds a callback/errback in parallel so only one or the
> > other is called?  so I need to add
> > # the final errback to catch errors from callback process_link ?
> 
> Correct. Well, sort of. See below.
> 
> > def start_feed_crawl(uri,depth):
> >     '''Harvest feeds from a uri'''
> > # Question: how to time-out this deferred chain if getPage is taking too
> > long to finish its work.
> > # what exactly does the argument timeout to getPage do,  does it timeout
> > the socket after a no-response
> > # or does it put an upper-bound on how long getPage has to finish its work?
> >
> >     getPage(uri, timeout=TIMEOUT).addCallbacks(callback=process_link,
> >                                                       callbackArgs=(uri,
> > depth, StateVars),
> >                                                       errback =
> > process_error,
> >                                                       errbackArgs=(uri,StateVars)
> >                                                       ).addErrback(process_error,
> > uri, StateVars)
> 
> It seems clearer to me to write this as follows, but that's personal
> preference:
> 
>      d = getPage(...)
>      d.addCallbacks(...)
>      d.addErrback(...)
> 
> But since you're setting up the call to the same errback twice, you
> could simplify this to:
> 
>      d = getPage(...)
>      d.addCallback(process_link, uri, depth, StateVars)
>      d.addErrback(process_error, uri, StateVars)
> 
> <http://twistedmatrix.com/projects/core/documentation/howto/defer.html#auto4>
> has a nice visual explanation of what happens when.
> 
> > # Question: since I'm starting up these defers in a callback they are
> > # being created after I've called reactor.run() since we call start_feed_crawl
> > # as we find new links that meet our criteria.  Am I doing anything bad here?
> > # All the examples I've seen (eg. p. 548-552 Python Cookbook, great eg by V. Volonghi
> > # and P. Cogolo) have their data up-front and therefore set-up all the defers before calling
> > # reactor.run().  Here I'm discovering my data as I go along and setting up deferrs while
> > # the reactor is spinning.  Here is my fundamental lack of understanding.  While this script
> > # seems to run okay, is it okay to do this?
> 
> Yes, that's fine. I think the one you've seen the most is the odd case -
> being able to set up all the Deferreds beforehand.
> 
> >     # Question: Is this how I kill the reactor -- ie. using some sort of
> > state condition.  Is there a better way,
> >     # should I try better to understand deferred-list.  For example.  A
> > top-level deferred-list that contains
> >     # other deferred-lists which get created to hold all the defers
> > (created by start_feed_crawl) for the
> >     # links on a given page.  Could this deferred-list be told to stop
> > the reactor when the other lists have
> >     # fired their callback (after the component defers have finished) ?
> > (Sorry for the convoluted question here
> >     # I'm new at this)
> 
> What you want to do is stop the reactor when everything is done
> processing. So after you call start_feed_crawl the first time, returning
> the Deferred that getPage gives you, you can add a callback to that
> which stops the reactor. The trick here is that if you stuff that
> deferred into a DeferredList before you add the callback that stops the
> reactor then if your first operation itself returns a deferred, the
> DeferredList won't call its callbacks until the other Deferred operation
> completes. So you'll be stacking up a whole bunch of Deferreds inside
> the first one, and the callback on the DeferredList that does the
> reactor.stop won't fire until you don't return a Deferred.
> 
> There might be an easier way to do this, but this the way I know
> (example attached). Someone please let me know if there's an easier way.
> To see the example, run it with 'twistd -noy fetchpage.tac' then do
> 'telnet localhost 9000' and send:
> 
> GET /?target=http://www.google.com/ HTTP/1.1
> Host: localhost
> 
> 
> 
> > Final question: occasionally I get errors that come from the http.py
> > code in twisted.  This get printed to the console, but don't necessarily
> > stop my program.  Should my errbacks be catching these?  How do I keep
> > errors from getting logged to the console (beside redirecting stderr). I
> > can post an example if necessary of the errors I'm getting.
> 
> When you create the DeferredList, pass in consumeErrors=1 - this will
> make debugging that much more annoying though...
> 
> HTH,
> Dave
> -------------- next part --------------
> from twisted.web import server
> from twisted.web.resource import Resource
> from twisted.web.client import getPage
> 
> from twisted.internet import defer, reactor
> from twisted.python import log
> from cgi import escape
> class Foo(Resource):
>     counter = 0
>     isLeaf=True
>     def render_GET (self, request):
>         self.rq = request
>         target = escape(request.args['target'][0])
>         d = getPage(target).addCallback(self.print_page)
>         d.addErrback(log.err)
>         dl = defer.DeferredList([d])
>         dl.addCallback(stopNow)
>         dl.addErrback(log.err)
>         return server.NOT_DONE_YET
> 
>     def print_page (self, html):
>         if Foo.counter < 5:
>             Foo.counter += 1
>             print 'request '+str(Foo.counter)
>             d = defer.Deferred()
>             d.addCallback(self.print_page)
>             d.addErrback(log.err)
>             reactor.callLater(1, d.callback, html)
>             return d
>         else:
>             print 'now we can write stuff back'
>             self.rq.write(str(len(html))+' '+str(Foo.counter))
>             self.rq.finish()
>             self.rq.transport.loseConnection()
>             # no deferred being returned, stopNow fires
> 
> def stopNow(cbval):
>     # can't add reactor.stop as a callback directly
>     # because it doesn't know what to do with the extra
>     # argument being returned from the callback
>     print cbval
>     reactor.stop()
> 
> resource = Foo()
> site = server.Site(resource)
> 
> from twisted.application import service, internet
> application = service.Application("Foo")
> internet.TCPServer(9000, site).setServiceParent(application)
> 
> # vim: ai sts=4 sw=4 expandtab syntax=python :
> 
> ------------------------------
> 
> _______________________________________________
> Twisted-web mailing list
> Twisted-web at twistedmatrix.com
> http://twistedmatrix.com/cgi-bin/mailman/listinfo/twisted-web
> 
> 
> End of Twisted-web Digest, Vol 15, Issue 20
> *******************************************
> 


-- 
Never think there is anything impossible for the soul. It is the
greatest heresy to think so. If there is sin, this is the only sin –
to say that you are weak, or others are weak.

Swami Vivekananda


More information about the Twisted-web mailing list