[Twisted-web] Efficient server-side web-scraping?

Eric Mangold teratorn at world-net.net
Mon Apr 18 06:11:53 MDT 2005


On Mon, 18 Apr 2005 16:24:03 +0530, Tom Locke <tom at livelogix.com> wrote:

> Hi,
>
> I've developed a small Python CGI app which I'm porting to Twisted Web  
> in order to add some in-memory caching.
>
> The app (you can see the current version at etrays.net) sticks a bunch  
> of hits from the ebay advanced search into a box for folks to stick on  
> their site. The server makes an HTTP request to the eBay search form,  
> and scrapes the result using Beautiful Soup.
>
> Right now, I simply do all the work in a .rpy script. If I've understood  
> Twisted correctly, the whole server blocks while my render_GET method  
> runs, right. (Twisted is single threaded)
>
> So the search on eBay blocks Twisted (I just call urllib.urlopen) which  
> is bad because it's pretty slow. Could anyone suggest a setup where the  
> eBay search can take place in the background, leaving twisted free to  
> process other incoming requests. When the eBay results come back, the  
> corresponding Twisted request would wake up, scrape the HTML and  
> complete.
>
> I guess I need to use threads here? And have a Twisted callback  
> triggered when the thread completes?

twisted.web.client.getPage is probably all you need.

-Eric



More information about the Twisted-web mailing list