[Twisted-web] Limit the simultaneous twisted.web.client.downloadPage requests

Igor Katson descentspb at gmail.com
Sat Oct 24 09:43:08 EDT 2009


exarkun at twistedmatrix.com wrote:
> On 01:20 pm, descentspb at gmail.com wrote:
>> Hello!
>>
>> I am a newbie in twisted, sorry if my question sounds awkward.
>>
>> I have written a pretty simple recursive page downloader, which parses
>> an html, extracts all the needed links from it, and starts dowloading
>> them. The links are the videofiles, so they are pretty large. The
>> problem is, that the downloader works TOO FAST :) I want to set
>> something like the global bandwidth limit or the maximum limit of
>> concurrently downloading files.
>>
>> I am using the twisted.web.client.downloadPage to download the files 
>> and
>> using the Deferred, that it returns.
>> I can't understand how to make it still return a Deferred, 
>> corresponding
>> to that file, but not start downloading right away, but instead start
>> downloading it on some kind of event (make a manger-like wrapper for
>> that function).
>>
>> So I want the code to still look simple like this:
>>
>> for link in links:
>>    d = downloadPage_limited(link, filename)
>>
>> And the wrapper(function downloadPage_limited) will manage the amount 
>> of
>> concurrent downloads, and will still return the Deferred, which will be
>> returned by twisted.web.client.downloadPage.
>>
>> Is my idea about a "wrapper" practical and what's the general way to
>> write it?
>> On which event is it better to decrement the counter of the amount
>> currently downloading files?
> 
> Yes, that's a good idea.
> 
> You might be able to use twisted.internet.defer.DeferredSemaphore to 
> handle all of the counting for you.  For example,
> 
>     from twisted.internet.defer import DeferredSemaphore
>     from twisted.web.client import downloadPage
> 
>     class LimitedDownloader:
>         def __init__(self, howMany):
>             self._semaphore = DeferredSemaphore(howMany)
> 
>         def downloadPage(self, *a, **kw):
>             return self._semaphore.run(downloadPage, *a, **kw)
> 
>     downloader = LimitedDownloader(3)
>     downloader.downloadPage(...)
> 
> In this example, DeferredSemaphore.run will only let 3 downloadPage 
> calls run concurrently.  If a 4th is attempted before any earlier ones 
> finish, it won't actually be called until one of the earlier ones does 
> finish, and then it will be called.

Thanks for quick and great help, Terry and Jean-Paul!




More information about the Twisted-web mailing list