[Twisted-web] overkill context switches

Andrea Arcangeli andrea at cpushare.com
Tue Sep 27 04:07:02 MDT 2005


On Tue, Sep 27, 2005 at 10:25:01AM +1000, Andrew Bennetts wrote:
> On Mon, Sep 26, 2005 at 05:50:24PM +0200, Andrea Arcangeli wrote:
> [...]
> > This should be a python or twisted-thread-pool bug. I'm not sure about the best
> > way to track it down, especially because so far I can reproduce it only on the
> > server, that is a bit slower and that I cannot use for debugging.
> 
> Twisted's threadpools are pretty straightforward wrappers around Python's
> threading module.  See twisted.python.threadpool.
> 
> I'd expect you could reproduce this with a simple python script that starts
> two threads, with one putting items on a Queue.Queue, and the other reading
> them off, as a simple approximation of the threadpool arrangement.
> 
> In fact, here's a script that does this:
> 
> ----
> import threading
> import Queue
> 
> q = Queue.Queue()
> 
> def producer():
>     # Count from 0 to 99, over and over.
>     i = 0
>     while True:
>         q.put(i)
>         i += 1
>         i %= 100
> 
> 
> def consumer():
>     while True:
>         i = q.get()
>         i * 2
> 
> p = threading.Thread(target=producer)
> c = threading.Thread(target=consumer)
> p.setDaemon(True)
> c.setDaemon(True)
> p.start()
> c.start()
> 
> import time; time.sleep(10)
> ----
> 
> vmstat reports lines like:
> 
>  1  1 350740  18828  38260 204880    0    0  2696     0 1099 100315 83 17  0  0
>  3  2 350740  16852  38336 206760    0    0  1956    20 1133 100174 86 14  0  0
>  1  1 350740  15116  38444 208352    0    0  1684   592 1106 99901 85 15  0  0
>  1  1 350740  12760  38520 210640    0    0  2364     0 1119 100183 87 13  0  0
>  1  1 350740  11024  38612 212336    0    0  1792     0 1123 100255 85 15  0  0
>  1  1 350740   9048  38696 214216    0    0  1964     0 1129 100057 85 15  0  0
> 
> (it also appears that the producer adds to the queue faster than the consumer
> gets from it.  Doesn't really matter, aside from the memory cost.)
> 
> I don't see it with a simple script that has two threads running in parallel
> with no inter-thread communication at all, however.  I'd say the place to
> look is Python's implementation of thread synchronisation primitives.

Thanks for the example but the above looks quite normal and expected,
this passes a token at a fast rate from one thread to the next. In my
workload there are say 5/6 queries, two through pb, the others through
psycopg2, so it should schedule a dozen times, not 20k times.

If I were to pass 10k separate tokens (similar to above testcase) then
20k schedules would have been normal.

I even tried pgasync and it also schedules >20k times. I tracked it down
by executing multiple sql queries simultaneously. So now I wonder if
it's postgres scheduling so fast and not twisted... Unfortunately
I have no kernel knob installed to know who is scheduling so fast and
strace alters the runtime as well.

I'll try to serialize the sql queries, so that no more than one query
runs at once, and I guess that'll fix it just fine. (working on making
this change right now)

It still looks to me that something could be optimized because on such a
basic workload that many schedules are not needed (the futex syscalls
as well makes no sense, especially with pgasync) but perhaps that's not
on the twisted side that generates the overscheduling.



More information about the Twisted-web mailing list