[Twisted-Python] Lots and lots and lots and lots... of deferreds

Steve Steiner (listsin) listsin at integrateddevcorp.com
Tue Oct 6 22:40:49 EDT 2009


So, I have a situation...

	I have an application whose basic function is, in simplified form:

	def main():
		get_web_page(main_page_from_params)

	def get_web_page(page_name):
		set up a page getter deferred,
			one of the callbacks gets the links out of the page and sends them  
to get_them()

	def get_them(links):
		for l in links:
			if l is not being gotten or hasn't been got:
				deferred = get_web_page(l)

	In other words, I feed in the top level page, then recursively feed  
in any pages linked to by the current page, and they feed in all their  
links, until all pages are gotten.

	I understand the concurrency issues with multiple deferred's trying  
to add pages to the "get list" -- it's properly handled in the code  
(far as I can tell, so far).

	So, here's the question...

	I have a "pages"  list containing all of the pages.

	They are set to either gotten or in-flight.

	In-flight means I have a deferred that's going to go get it (in  
get_web_page()).

	IOW, right now, if I don't already have the page, and I have a link  
to it, I just start a deferred to go get it.

	Should I limit the number of "in-flight" pages?

	Currently, I'm scanning sites that have upwards of 5000 pages and it  
seems that, when I get too many deferred's in flight, the app  
*appears* to crash.

	I'm not sure whether it's actually going out to lunch or just appears  
that way and, before I go instrumenting the app to death, can anyone  
tell me whether there is some sort of practical limit to how many "in- 
flight" deferreds might start to cause issues, just due to the sheer  
number?

	Thanks for any insight on this that anyone might offer.

	I expect the usual flurry of  "you must post your exact code or we  
can't help you at all, moron" posts, but...
	
	In spite of my not having posted specific code, could someone with  
some actual experience in this please give me a clue, within an order  
of magnitude, how many deferreds might start to cause real trouble?

Thanks,

S






More information about the Twisted-Python mailing list