[Twisted-web] Efficient server-side web-scraping?
Eric Mangold
teratorn at world-net.net
Mon Apr 18 06:11:53 MDT 2005
On Mon, 18 Apr 2005 16:24:03 +0530, Tom Locke <tom at livelogix.com> wrote:
> Hi,
>
> I've developed a small Python CGI app which I'm porting to Twisted Web
> in order to add some in-memory caching.
>
> The app (you can see the current version at etrays.net) sticks a bunch
> of hits from the ebay advanced search into a box for folks to stick on
> their site. The server makes an HTTP request to the eBay search form,
> and scrapes the result using Beautiful Soup.
>
> Right now, I simply do all the work in a .rpy script. If I've understood
> Twisted correctly, the whole server blocks while my render_GET method
> runs, right. (Twisted is single threaded)
>
> So the search on eBay blocks Twisted (I just call urllib.urlopen) which
> is bad because it's pretty slow. Could anyone suggest a setup where the
> eBay search can take place in the background, leaving twisted free to
> process other incoming requests. When the eBay results come back, the
> corresponding Twisted request would wake up, scrape the HTML and
> complete.
>
> I guess I need to use threads here? And have a Twisted callback
> triggered when the thread completes?
twisted.web.client.getPage is probably all you need.
-Eric
More information about the Twisted-web
mailing list