[Twisted-Python] Scalability of an rss-aggregator
andrew-twisted at puzzling.org
Wed Mar 31 07:34:06 EST 2004
On Wed, Mar 31, 2004 at 01:27:49PM +0200, Valentino Volonghi aka Dialtone wrote:
> Andrew Bennetts wrote:
> >On Wed, Mar 31, 2004 at 09:33:58AM +0200, Valentino Volonghi aka Dialtone
> >>Hi all,
> >>attached you will find my rss-aggregator made with twisted.
> >>It's really fast although when I tried with 745 feeds I got some problems.
> >>When the download reached 300 parsed feeds (more or less) it locked till
> >>I pressed Ctrl+C and then it
> >>processed the remaining 340 feeds in less than 30 seconds... I think
> >>that my design has at least an issue
> >>but I cannot find it so easily and I hope someone on this list can help
> >>me to improve it.
> >By default, Twisted uses the platform name resolver, which is blocking.
> >Perhaps a non-existent domain is causing gethostbyname to block?
> Uhmm... dunno, but I tried to remove the 'locking' feed-source and it
> didn't change.
Hmm, it's unlikely to be DNS lookups causing it, then.
We need some way to narrow down where it's happening. There are a few
options I can think of, but they're all a bit heavyweight...
- Use strace to get some idea what it's doing
- Use the --spew option of twistd (or manually install the spewer with
"from twisted.python.util import spewer; sys.settrace(spewer)")
- Use gdb to attach the process, then and look at the backtrace there.
(You can apparently get the python backtrace in gdb by putting this macro in
while $pc < Py_Main || $pc > Py_GetArgcArgv
if $pc > eval_frame && $pc < PyEval_EvalCodeEx
set $__fn = PyString_AsString(co->co_filename)
set $__n = PyString_AsString(co->co_name)
printf "%s (%d): %s\n", $__fn, f->f_lineno, $__n
But I've never tried this...
Is it possible that feedparser is hanging on trying to parse that feed?
Perhaps trying putting print statements before and after the
> >You should be able to test this theory by installing Twisted's resolver:
> > from twisted.names import client
> > reactor.installResolver(client.createResolver())
> >client.createResolver makes a resonable effort to use your system's DNS
> >configuration (by looking at /etc/resolve.conf on posix systems, for
> >example), so it should work without any special arguments.
> ok, it changes into a totally non-working script :)
> I get a lot of these:
> [Failure instance: Traceback: exceptions.TypeError, unsubscriptable object
Ouch. I wonder how that bug crept in? The twisted.names code is expecting a
sequence of timeouts (to re-issue the query with, until failing at last), but
twisted.internet is only giving it a single integer. I've filed a bug
report for this: http://twistedmatrix.com/bugs/issue570, if you care :)
> >>BTW When it finishes (with all 740 feeds) it reports an awesome 330
> >>seconds which is an impressive time, less than half a second
> >>for each feed, and It downloads more than 50Mb of feeds from the net
> >>(with 745 feeds to download).
> Yup, was going to ask for my script to be used instead of asyncore to
> Straw developers.
> Straw has a lot of problems with 200 feeds ie resets the connection and
> such. This would be an awesome improvement.
Absolutely. I've heard similar complaints about straw, and I've been hoping
some keen person would apply Twisted to fix the problem :)
More information about the Twisted-Python