[Twisted-web] How to use the HTTPClientFactory connect one time and get more than 1000 page?

Cheney Lee ironpythonster at gmail.com
Wed Jan 23 03:46:50 EST 2008


Hi,
it is my first time use twisted .
i want to use the a function pass a url then get the web page
the code as :
// *some code call getPage*

*while id <= 10000000:
         getPage("**http://www.mywebsite.com/News.aspx?ID="+str(id*<http://www.mywebsite.com/News.aspx?ID="+str(id>
*))
         id += 1
*//******************************************

*the getPage is definition in twisted.web.client*

*def getPage(url, contextFactory=None, *args, **kwargs):
    """Download a web page as a string.*

*    Download a page. Return a deferred, which will callback with a
    page (as a string) or errback with a description of the error.*

*    See HTTPClientFactory to see what extra args can be passed.
    """
    scheme, host, port, path = _parse(url)
    factory = HTTPClientFactory(url, *args, **kwargs)
    if scheme == 'https':
        from twisted.internet import ssl
        if contextFactory is None:
            contextFactory = ssl.ClientContextFactory()
        reactor.connectSSL(host, port, factory, contextFactory)
    else:
        reactor.connectTCP(host, port, factory)
    return factory.deferred*
--------------------------------------------------------------------------------

*Question:
*for the  getPage function,if  use it to get 10000 page ,it would open/close
connection 10000 times,it is a very large cost。
So ,any body can give me some advice?creat a class inherit from
HTTPPageGetter(as a protocol class)
or HTTPClientFactory?
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://twistedmatrix.com/pipermail/twisted-web/attachments/20080123/c356ce29/attachment.htm


More information about the Twisted-web mailing list