[Twisted-Python] Naive questions about how to do it twisted way.
Jia Liu
ayuer.python at gmail.com
Wed Jun 28 03:07:14 MDT 2006
Hi,
I have been follow twisted for like half a year now, I have read the doc
and the oreilly book, but i still cann't do some simple work in the twisted
way. Could someone give me a hint?
Like today, i write this little code to follow my friends list on my msn
space, and get their friends list, so on and so on, and maybe at last I can
build a social relationship graph with it. The code is actually really
navie, here it goes. Sorry for the ugliness of my code, please help me make
it prettier:)
Thanks. Appologize for my poor English.
Jia Liu
import re, sys, socket
import urllib
from sets import Set
socket.setdefaulttimeout = 10
re_buddy_pre = re.compile("href=\"(?:http://)?([^\.\"]+)\.spaces.msn.com",
re.IGNORECASE) # http://nick.spaces.msn.com
re_buddy_post = re.compile("spaces.msn.com/members/([^\"\/]+)",
re.IGNORECASE) # http://spaces.msn.com/members/nick/
all_buddies = Set()
relation_dict = {}
def get_buddy(page, buddy=None):
#print page
print buddy
global all_buddies
buddies_of_this_page = Set()
for re_buddy in [re_buddy_pre, re_buddy_post]:
for m in re_buddy.finditer(page):
buddies_of_this_page.add(m.group(1))
relation_dict[buddy] = buddies_of_this_page
return buddies_of_this_page
def got(url, buddy=None):
f = urllib.urlopen(url)
return f.read(), buddy
URL = "http://ayueer.spaces.msn.com/"
(page, buddy) = got(URL, "ayueer")
buddies_of_this_page = get_buddy(page, buddy)
all_buddies.add("ayueer")
next_to_visit = buddies_of_this_page.difference(all_buddies)
try:
while 1:
#reactor.run()
#multi_buddy_factory(buddies_of_this_page)
for buddy in next_to_visit:
url = "http://"+buddy+".spaces.msn.com/"
try:
(page, buddy) = got(url, buddy)
buddies_of_this_page.update(get_buddy(page, buddy))
except:
pass
all_buddies.update(next_to_visit)
next_to_visit = buddies_of_this_page.difference(all_buddies)
except KeyboardInterrupt:
print all_buddies
--
银筝夜久殷勤弄,心怯空房不忍归
-------------- next part --------------
An HTML attachment was scrubbed...
URL: </pipermail/twisted-python/attachments/20060628/0c2b379b/attachment.html>
More information about the Twisted-Python
mailing list