[Twisted-Python] [ANN] Turtl 0.1.1 throttler released
Valentino Volonghi
dialtone at gmail.com
Tue Aug 16 15:35:04 EDT 2011
Hi,
Turtl is an HTTP proxy whose purpose is to throttle connections to
specific hostnames to avoid breaking terms of usage of those API
providers (like del.icio.us, technorati and so on).
At the core of turtl is a throttling deferred that works in a similar way as DeferredSemaphore() except that it will enforce also a rate (N calls every M seconds) at which deferreds added to it are fired.
In the past few weeks it's been improved a couple obscure bugs have been ironed out. It's been running as a proxy for a couple of years and recently we started using it as a crawler rate limiter.
Source code lives on bitbucket: https://bitbucket.org/adroll/turtl/overview
Here's a small example of its usage:
import time
from twisted.internet import reactor, defer
from twisted.protocols.policies import WrappingFactory
from twisted.web import client, server, resource
from turtl import engine
throttle = engine.ThrottlingDeferred(concurrency=1, calls=2, interval=1)
class FakeResource(resource.Resource):
isLeaf = True
def render(self, request):
return "hello"
def setupServer():
site = server.Site(FakeResource())
wrapper = WrappingFactory(site)
port = reactor.listenTCP(0, wrapper, interface="127.0.0.1")
portno = port.getHost().port
return portno
def stop(_):
return reactor.stop()
def makeUrl(port):
return "http://localhost:%s/" % (port)
def prinl(page):
print time.time(), page
port = setupServer()
url = makeUrl(port)
defer.DeferredList([throttle.run(client.getPage, url).addBoth(prinl) for i in xrange(1000)]).addBoth(stop)
reactor.run()
--
Valentino Volonghi
http://www.adroll.com
More information about the Twisted-Python
mailing list