[Twisted-Python] How to Solve a Problem Like Santa with Stackless and Twisted

Andrew Francis andrewfr_ice at yahoo.com
Wed Dec 14 14:24:21 EST 2011


Hi Glyph:

Message: 3
Date: Tue, 13 Dec 2011 23:45:54 -0500
From: Glyph <glyph at twistedmatrix.com>
Subject: Re: [Twisted-Python] How to Solve a Problem Like Santa with
    Stackless and Twisted
To: Twisted general discussion <twisted-python at twistedmatrix.com>
Message-ID: <F1901420-DAD9-4879-8E65-CAFC123BACA4 at twistedmatrix.com>
Content-Type: text/plain; charset="iso-8859-1"

On Dec 13, 2011, at 2:04 PM, Andrew Francis wrote:

Hi Glyph:

G>The only reason that the 
Santa problem, expressed as a thread-synchronization problem, has to 
deal with G>"simultaneous" arrival of 9 reindeer and 3 elves is a quirk of
 thread scheduling: the Santa thread can be ready to run G>but not 
running.  When the final reindeer arrives, a final elf may arrive before
 the scheduler decides it's time to give G>Santa a chance to run, and then
 Santa has the opportunity to make the choice between the two. 

I agree: it is the scheduler that introduces quirks. Not that it matters but the important edge case is when three elves have arrived. Santa is awoken and placed on the queue. However additional reindeer have arrived bringing the count to nine. So when Santa runs, both the group of nine reindeer and group of three elves are ready. Hence the need for Santa to check when he is awake. In reality, I have to leave the programme running for a  long time to see this happen.

G>In an 
event-driven system, this state doesn't naturally exist: when you call 
Deferred.callback(), the callback runs the G>relevant application code 
immediately.

In regards to Stackless, if I wanted to greatly simplify the problem, I would immediately schedule Santa once reindeer or a group of elves were ready. This is what Twisted is doing. Perhaps in a real-life case, this would be the correct action since it is obvious that Santa is a high priority task. However I am trying to be true to the original problem. Also a good solution should not depend on scheduling tricks.

Otherwise Twisted with deferred lists would otherwise be pretty effective. Again the secret sauce is that deferred lists are a join pattern of sorts. The component is that Twisted is essentially a non-preemptive first come first served scheduler. This behaviour also simplifies the problem.

G>Despite
 the fundamentally sequential nature of reality, 

Maybe this is the reality of computers based on von Neuman architecture. However reality is fundamentally concurrent.

G>Twisted reactor's 
scheduling mechanism does leave somethings to be desired; the quantum 
time-slice of Twisted is G>effectively one reactor iteration, and there 
aren't really any tools for working with those.  There's some latency 
G>between becoming aware of some work that needs to be done (select(), or 
your chosen multiplexor, returning) and G>actually doing the work (calling
 doRead, doWrite, or your timed event), but you don't get to decide on 
priorities once G>Twisted's aware of the work; they're executed in a 
quasi-arbitrary order.

I feel a Twisted callback and a Stackless tasklet in non-preemptive mode suffer from a bigger problem: one does not know how long the code fragment will execute. The programmer is betting that the CPU slice will be relatively short. An additional assumption is that all tasklets/callbacks are created equally (actually in Stackless, one can use preferences to give some tasklets a limited priority). I feel the reason we have sophisticated schedulers is when throughput and fairness are a big deal. 

G>Ultimately however, the order of the work 
will be arbitrary; if it's not influenced by the vagaries of Twisted's 
scheduler, G>it will be influenced by something in the kernel scheduler 
that you don't understand or can't control, or in the IP stack, G>or in a 
switch on your network, or in your hosting provider's firewall 
configuration, or something on a client machine G>that you truly have no 
influence over at all.  Having something give priority to an event that 
occurs at "the same" G>time given all of these potential sources of 
interference is basically pointless.

I am not an expect in real-time systems. However I would argue that at a pragmatic level,  timestamps have to be of a resolution that allows the application to make reasonable decisions. So if the application is measuring time in seconds to say two decimal places, as far as the application is concerned, it can have a reindeer and elf arriving at the same time if they have the same timestamps.  

Perhaps my original example leaves something to be desired. However my point still remains. If we constructed a schedule so that the ninth reindeer always arrived at T+11 seconds and three elves always arrived at T+9, T+10, and T+11 seconds, it would be a coin toss in Twisted as to whether reindeer or elves were serviced. Rather than a reindeer being in a queue, in Twisted, its file descriptor would be in a reader set. All I can say is that a Twisted solution would be unbiased (or biased towards elves if there we are playing with 10 of them). When the problem requires it is biased towards reindeer. If one had a burning desire, we could construct cases to measure if there was a bias.

Cheers,
Andrew




More information about the Twisted-Python mailing list