[Twisted-Python] Scrapy spiders waiting in reactor thread when callFromThread gets call repeatedly

Adi Lavi adi.lavi at cortica.com
Sun Dec 21 04:47:11 MST 2014


Hi,
I am using Pika's asynchronous consumer implementation with Scrapy and
Twisted. I have twisted reactor running on the main thread, and Rabbit
consumer running on a background thread. When I get a message and want to
start my spider, I use 'callFromThread' to wake the reactor thread, init
the spider and start crawling.

Alas, on high load of Q messages, I find that because 'callFromThread' is
called all the time, Scrapy does not start downloading until there is some
'break' in these calls.

I am wondering what is the best approach to gain high scale with Scrapy,
Twisted and RabbitMQ. Should I continue using the current design, and
simply do some buffering or batching to reduce the 'callFromThread'
frequency? Perhaps I should use a synchronous design?

Thanks
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://twistedmatrix.com/pipermail/twisted-python/attachments/20141221/3351d230/attachment.html>


More information about the Twisted-Python mailing list