[Twisted-web] what are the advantage of using a single-threaded server? and when should we use deferreds?

Sat Apr 26 05:42:46 EDT 2008

inhahe wrote:
> hi, excuse my noobness, I have a few basic questions about twisted, or probably
> about web servers in general.

There's nothing web-specific in these questions that I see... they apply to any
network service serving requests.

> what is the advantage of using a single-threaded server?
> 
> i figured it makes it more scalable because there's too much overhead to have a
> thread for each user when you have many simultaneous users.  but a friend i'm
> talking to now says that using i/o blocking threads is perfectly scalable for a
> large number of simultaneous users. 

That's basically right.  Threads can be a scalability issue, particularly if you
have many connections that are mostly idle — you end up with a lot of wasted
memory (for stack space).

Another problem with threads is non-determinism.  You can't easy construct a
test suite that will find every possible race condition, because a thread can be
pre-empted at any time.  In effect you have a state machine with a massive
number of states, many more than necessary.  With a single thread, you can
simply and reliably test what happens when events happen a particular order. 

Personally, I find this latter advantage more compelling.  The performance
differences are in many respects minor (and not clearly one way), especially
compared to the overhead of using Python over C/C++.  I find it *much* easier to
write and test non-threaded code (and for that matter, I find it much easier to
write and test Python).  No point worrying about performance unless I can be
confident in the correctness :)

> if that's true i can only see a disadvantage in using a single-threaded server
> -- having to use deferreds and stuff to make things asynchronous

That is a disadvantage.  Creating lots of objects and calling lots of functions
can be a performance issue in Python.  Deferreds are *much* nicer than the
obvious alternative (passing callback functions to functions that produce
asynchronous results), though.

Fundamentally, concurrent programming is more complex than non-concurrent.  The
question is which tradeoffs suit your problem best.

> i also don't understand how you're supposed to use deferreds
> the twisted doc says deferreds won't *make* your code asynchronous.  so let's
> say you have to do an sql query that takes 10 seconds, deferreds would be
> useless for making that not block unless you have a way of making that sql
> query non-blocking already?  how is that done?  do you run a separate thread of
> your own for each sql query?  one thread for all sql queries?

You've got it.  If you have a blocking API, there's nothing you can do to make
it non-blocking apart from running it in a thread (if it's kind enough to
release the GIL) or running it in a subprocess (if you don't mind the overhead
of spawning another process and the complexity of marshalling messages to it
rather than simply sharing an address space).

Note that a common compromise between “a separate thread of your own for each
sql query” and “one thread for all sql queries” is a thread pool with a limited
number of threads.  This is what twisted.enterprise.adbapi basically does to run
SQL using the standard Python DB-API.

> also I wonder in an typical twisted app, just how slow should an operation be
> before you use a deferred?  what if a user enters a username and password and i
> have to look that up in the database. do i use a deferred?  just how bad should
> the query be before using a deferred?

The precise answer is: it depends.

The short answer is: if it does I/O or is obviously slower than instant, then
it's blocking and should be avoided (in your main thread).

To be precise: it depends on your requirements: basically, what performance do
you need?  If a lookup in the database is only, say, 30ms, and you don't lots of
concurrent requests, and they only need to do that one lookup, and you only need
an average latency for replying to requests of 100ms, then you'd be pretty
comfortable with just blocking for that lookup.

Typically, anything that doesn't return immediately, for some value of
“immediately”, is good to treat as blocking, and thus something to avoid in your
main thread.  Small writes to disk are often fast enough to count as
“immediate”.  Small reads that are probably cached in RAM by your OS might be
too.  Querying a database usually isn't.  It depends on your exact situation,
though.  It sounds like you already have a good idea of the sorts of things to
watch out for, though.

Basically, there's no magic substitute for measuring actual performance, and
asking yourself “is it good enough?”

> (reading the twisted docs is like reading a brick wall for me, it would be nice
> if someone could just explain things to me in simple terms.)

It sounds to me like you've actually understood things quite well. :)

-Andrew.