[Twisted-Python] How Twisted is This?

Bob Ippolito bob at redivi.com
Thu Jul 17 18:44:14 MDT 2003


On Thursday, Jul 17, 2003, at 19:11 America/New_York, Greg Fortune 
wrote:

> On Thursday 17 July 2003 02:55 am, Tommi Virtanen wrote:
>> On Wed, Jul 16, 2003 at 06:39:09PM -0400, Bob Ippolito wrote:
>>> watch out for when writing a web spider, for example.  It just 
>>> doesn't
>>> make sense to intentionally design something that behaves like:
>>> ______/\___________/\_____ when you really have an extremely
>>> parallelizable task at hand.  Checking site A is in no way dependent 
>>> on
>>> checking site B, so there's no reason to intentionally make them 
>>> happen
>>> at the same time if you know better.  We're not talking about a whole
>>> lot of code here.  Leveraging the fact that 99% of what needs to be
>>> done is glue code between stuff that's already in twisted, a 
>>> prototype
>>> of his application could be easily be done in less than fifty lines
>>> using the scalable approach.  In fact, I think that the "wake up and 
>>> do
>>> stuff every N minutes" approach would actually end up being a longer
>>> and easier to screw up implementation.
>>
>> 	You all should probably read up on what the Nagios project
>> 	thinks about randomizing monitoring intervals. They have a
>> 	stable open source product that can scale reasonably well,
>> 	and have good opinions on that subject.
>
> I'm pretty sure no one is talking about randomizing...  One solution 
> is a
> fixed interval check and the other is a "as needed" check.  The only 
> ways I
> can imagine that a "as needed" check would be more costly is if the 
> monitors
> were queued very frequently and the controller ended up waking 
> up/sleeping
> very very short intervals or if a huge number of monitors were queued 
> with
> start times far into the future and the controller took a long time to 
> search
> the list for the next pending monitor.  An intelligent insertion 
> routine
> could solve the second issue easily and if a five minute check is 
> sufficient,
> I can't imagine the first issue being a problem...

Randomizing events is a rather good way to prevent collisions.  
Ethernet, for example, uses randomized delays when a collision happens. 
  This is how it can work without the complications of passing a token 
around or what have you.  Random delays are also often used for 
cryptography (I believe OpenSSL does this now) to prevent a certain 
kind of attacks.  As you said, it's not something strictly necessary at 
all for this sort of application, unless a lot of events are queued to 
fire at the same time (which is the issue I brought up, originally).  
Imagine, for example, the application starts up with a database of 
40,000 hosts to check at a 5 minute interval.  If the application 
starts _all_ of them at the same time, you're going to see much worse 
performance than if it scheduled them all with some random offset.  The 
other thing is that it's rather easy to implement, especially with 
Twisted.

Also, switching between sleeping and waking up isn't the worst thing in 
the world - it means network activity is happening (or you're polling, 
which is silly).

-bob





More information about the Twisted-Python mailing list