[Twisted-web] Limit the simultaneous twisted.web.client.downloadPage requests
descentspb at gmail.com
Sat Oct 24 09:43:08 EDT 2009
exarkun at twistedmatrix.com wrote:
> On 01:20 pm, descentspb at gmail.com wrote:
>> I am a newbie in twisted, sorry if my question sounds awkward.
>> I have written a pretty simple recursive page downloader, which parses
>> an html, extracts all the needed links from it, and starts dowloading
>> them. The links are the videofiles, so they are pretty large. The
>> problem is, that the downloader works TOO FAST :) I want to set
>> something like the global bandwidth limit or the maximum limit of
>> concurrently downloading files.
>> I am using the twisted.web.client.downloadPage to download the files
>> using the Deferred, that it returns.
>> I can't understand how to make it still return a Deferred,
>> to that file, but not start downloading right away, but instead start
>> downloading it on some kind of event (make a manger-like wrapper for
>> that function).
>> So I want the code to still look simple like this:
>> for link in links:
>> d = downloadPage_limited(link, filename)
>> And the wrapper(function downloadPage_limited) will manage the amount
>> concurrent downloads, and will still return the Deferred, which will be
>> returned by twisted.web.client.downloadPage.
>> Is my idea about a "wrapper" practical and what's the general way to
>> write it?
>> On which event is it better to decrement the counter of the amount
>> currently downloading files?
> Yes, that's a good idea.
> You might be able to use twisted.internet.defer.DeferredSemaphore to
> handle all of the counting for you. For example,
> from twisted.internet.defer import DeferredSemaphore
> from twisted.web.client import downloadPage
> class LimitedDownloader:
> def __init__(self, howMany):
> self._semaphore = DeferredSemaphore(howMany)
> def downloadPage(self, *a, **kw):
> return self._semaphore.run(downloadPage, *a, **kw)
> downloader = LimitedDownloader(3)
> In this example, DeferredSemaphore.run will only let 3 downloadPage
> calls run concurrently. If a 4th is attempted before any earlier ones
> finish, it won't actually be called until one of the earlier ones does
> finish, and then it will be called.
Thanks for quick and great help, Terry and Jean-Paul!
More information about the Twisted-web