[Twisted-Python] [ANN] Turtl 0.1.1 throttler released

Valentino Volonghi dialtone at gmail.com
Tue Aug 16 13:35:04 MDT 2011


Hi,

Turtl is an HTTP proxy whose purpose is to throttle connections to
specific hostnames to avoid breaking terms of usage of those API
providers (like del.icio.us, technorati and so on).

At the core of turtl is a throttling deferred that works in a similar way as DeferredSemaphore() except that it will enforce also a rate (N calls every M seconds) at which deferreds added to it are fired.

In the past few weeks it's been improved a couple obscure bugs have been ironed out. It's been running as a proxy for a couple of years and recently we started using it as a crawler rate limiter.

Source code lives on bitbucket: https://bitbucket.org/adroll/turtl/overview

Here's a small example of its usage:

import time
from twisted.internet import reactor, defer
from twisted.protocols.policies import WrappingFactory
from twisted.web import client, server, resource
from turtl import engine

throttle = engine.ThrottlingDeferred(concurrency=1, calls=2, interval=1)

class FakeResource(resource.Resource):
    isLeaf = True
    def render(self, request):
        return "hello"

def setupServer():
    site = server.Site(FakeResource())
    wrapper = WrappingFactory(site)
    port = reactor.listenTCP(0, wrapper, interface="127.0.0.1")
    portno = port.getHost().port
    return portno

def stop(_):
    return reactor.stop()

def makeUrl(port):
    return "http://localhost:%s/" % (port)

def prinl(page):
    print time.time(), page

port = setupServer()
url = makeUrl(port)
defer.DeferredList([throttle.run(client.getPage, url).addBoth(prinl) for i in xrange(1000)]).addBoth(stop)
reactor.run()


-- 
Valentino Volonghi
http://www.adroll.com





More information about the Twisted-Python mailing list