[Twisted-web] Efficient server-side web-scraping?
Tom Locke
tom at livelogix.com
Mon Apr 18 04:54:03 MDT 2005
Hi,
I've developed a small Python CGI app which I'm porting to Twisted Web
in order to add some in-memory caching.
The app (you can see the current version at etrays.net) sticks a bunch
of hits from the ebay advanced search into a box for folks to stick on
their site. The server makes an HTTP request to the eBay search form,
and scrapes the result using Beautiful Soup.
Right now, I simply do all the work in a .rpy script. If I've understood
Twisted correctly, the whole server blocks while my render_GET method
runs, right. (Twisted is single threaded)
So the search on eBay blocks Twisted (I just call urllib.urlopen) which
is bad because it's pretty slow. Could anyone suggest a setup where the
eBay search can take place in the background, leaving twisted free to
process other incoming requests. When the eBay results come back, the
corresponding Twisted request would wake up, scrape the HTML and complete.
I guess I need to use threads here? And have a Twisted callback
triggered when the thread completes?
Thanks
Tom.
More information about the Twisted-web
mailing list