[Twisted-web] web client

Lorenzo lorenzov at libero.it
Sun Feb 18 16:07:58 CST 2007


Hello, I'm using twisted to develop a small crawler. I need to download a large set of HTML pages/RSS feeds and I'd like to use the twisted's Web client.

I ì've found some examples on the net, and I'm using mostly the one attached here http://twistedmatrix.com/bugs/issue1079 , but I'm having some problems.

first of all I need the reactor to halt after all the files on the queue have been downloaded. Basically the sample uses a counter, that it updates every time it adds a new page to the list.


d = getPage(address) 
d.addBoth(self.decConn)
d.setTimeout(4, self.timeout)
self.connections += 1

Everytime one of decConn or timeout has been fired I update the counter

self.connections -= 1

When it becames <= 0 I stop the reactor. Basically this approach doesn't work, because the two functions are fired many times for some requests (I guess those are redirects...).

So, how can I correctly check the number of completed requests and eventually stop the reactor?
Is there any sample of a small crawler (also one that doesn't use twisted....) available on the net??
thanks
LV


------------------------------------------------------
Passa a Infostrada. ADSL e Telefono senza limiti e senza canone Telecom
http://click.libero.it/infostrada18feb07





More information about the Twisted-web mailing list