<div>Hi,</div>
<div> I have been follow twisted for like half a year now, I have read the doc and the oreilly book, but i still cann't do some simple work in the twisted way. Could someone give me a hint?</div>
<div> Like today, i write this little code to follow my friends list on my msn space, and get their friends list, so on and so on, and maybe at last I can build a social relationship graph with it. The code is actually really navie, here it goes. Sorry for the ugliness of my code, please help me make it prettier:)
</div>
<div> Thanks. Appologize for my poor English.</div>
<div>Jia Liu</div>
<div>
<p>import re, sys, socket<br>import urllib<br>from sets import Set</p>
<p>socket.setdefaulttimeout = 10<br>re_buddy_pre = re.compile("href=\"(?:http://)?([^\.\"]+)\.spaces.msn.com", re.IGNORECASE) # <a href="http://nick.spaces.msn.com/">http://nick.spaces.msn.com</a><br>
re_buddy_post = re.compile("<a href="http://spaces.msn.com/members/([^\">spaces.msn.com/members/([^\</a>"\/]+)", re.IGNORECASE) # <a href="http://spaces.msn.com/members/nick/">http://spaces.msn.com/members/nick/
</a> </p>
<p>all_buddies = Set()<br>relation_dict = {}</p>
<p>def get_buddy(page, buddy=None):<br> #print page<br> print buddy<br> global all_buddies<br> buddies_of_this_page = Set()<br> for re_buddy in [re_buddy_pre, re_buddy_post]:<br> for m in re_buddy.finditer(page):
<br> buddies_of_this_page.add(m.group(1))<br> relation_dict[buddy] = buddies_of_this_page<br> return buddies_of_this_page</p>
<p>def got(url, buddy=None):<br> f = urllib.urlopen(url)<br> return f.read(), buddy<br> <br>URL = "<a href="http://ayueer.spaces.msn.com/">http://ayueer.spaces.msn.com/</a>"<br>(page, buddy) = got(URL, "ayueer")
</p>
<p>buddies_of_this_page = get_buddy(page, buddy)<br>all_buddies.add("ayueer")<br>next_to_visit = buddies_of_this_page.difference(all_buddies)</p>
<p>try:<br> while 1:<br> #reactor.run()<br> #multi_buddy_factory(buddies_of_this_page)<br> for buddy in next_to_visit:<br> url = "<a href="http://"+buddy+".spaces.msn.com/">
http://"+buddy+".spaces.msn.com/</a>"<br> try:<br> (page, buddy) = got(url, buddy)<br> buddies_of_this_page.update(get_buddy(page, buddy))<br> except:<br>
pass<br> all_buddies.update(next_to_visit)<br> next_to_visit = buddies_of_this_page.difference(all_buddies)<br>except KeyboardInterrupt:<br> print all_buddies</p><br clear="all"><br>-- <br>
ÒøóÝÒ¹¾ÃÒóÇÚŪ£¬ÐÄÇÓ¿Õ·¿²»È̹é </div>