[Twisted-Python] How Twisted is This?

Wed Jul 16 18:39:09 EDT 2003

On Wednesday, Jul 16, 2003, at 16:53 America/New_York, Peter Hansen 
wrote:

> Bob Ippolito wrote:
>>
>> On Wednesday, Jul 16, 2003, at 12:12 America/New_York, Peter Hansen
>> wrote:
>>
>>> One good reason to do the "every five minutes check" thing instead of
>>> "wait exactly this long based on a precalculated value" thing is that
>>> if the things to be monitored can be added or removed dynamically,
>>> interrupting the sleep is more difficult than it's worth, while the
>>> blind periodic check is much simpler and more robust.
>>
>> I don't necessarily agree with that.  With the "every five minutes
>> check" you have to write your own callLater mechanism, when you let
>> tasks schedule themselves, you don't.  Also with the "every five
>> minutes check" you're using computer+bandwidth resources in a much 
>> less
>> sane manner, it sure as hell isn't going to scale awfully well.
>
> Mentioning callLater makes this an implementation question now, whereas
> I believe both Brad and I were talking from a more general point of 
> view.

I said "your own callLater mechanism", meaning "your own code that 
doesn't do anything very different from callLater".  We're talking 
about using a particular framework here anyways, and the claim I'm 
making is that this framework already has a well-designed and tested 
implementation this particular functionality and it'd be silly to write 
your own.

> I was, anyway.  From a general point of view, I'm right ;-), because
> you simply don't have to worry about issues related to adding or 
> removing
> items, or changing the time delays of them as you would if you 
> calculated
> the delay until the next activity and went to sleep for that long.

How can you not have to worry about adding or removing items to a list 
of recurring tasks?  You have the same list of tasks for either 
approach.  Calculating the number of seconds in 5 minutes isn't any 
harder for a "waker" than it is for whatever function is scheduling 
your tasks.. it's simple, the answer is minutes*60.0, which is 300.0 
seconds in this case.  With a "wakerless" approach you can have the 
tasks scheduled at different sorts of intervals, and even non-constant 
intervals (i.e. exponential backoff).  If you have a "waker" that 
supports this kind of functionality, that manages a list of stuff that 
should happen after variable intervals of time, finds the minimum, and 
sleeps for that long, you just rewrote part of the reactor.  The good 
thing about using pieces of the twisted framework that already exist is 
that you don't have to write them, they generally have tests, and 
generally work extremely well.

> As for bandwidth: I'm of the "premature optimization is bad" school
> of thought, and it's far too early in this discussion to be worrying
> about a few microseconds of CPU usage per day...

It's not a premature optimization, it's a different way of designing 
the networking core of this "company branded salable application" that 
happens to be easier and more scalable.  For example, a process can 
only acquire some fixed number of file handles (sockets) before it 
starts to get really-hard-to-handle errors.  If you try and open up X 
connections at the same time, you're much more likely to run into this 
error given enough sites to check, it's something you really have to 
watch out for when writing a web spider, for example.  It just doesn't 
make sense to intentionally design something that behaves like: 
______/\___________/\_____ when you really have an extremely 
parallelizable task at hand.  Checking site A is in no way dependent on 
checking site B, so there's no reason to intentionally make them happen 
at the same time if you know better.  We're not talking about a whole 
lot of code here.  Leveraging the fact that 99% of what needs to be 
done is glue code between stuff that's already in twisted, a prototype 
of his application could be easily be done in less than fifty lines 
using the scalable approach.  In fact, I think that the "wake up and do 
stuff every N minutes" approach would actually end up being a longer 
and easier to screw up implementation.

-bob