[Twisted-web] what are the advantage of using a single-threaded server? and when should we use deferreds?

Sat Apr 26 05:37:05 EDT 2008

On Saturday 26 April 2008, inhahe wrote:

> what is the advantage of using a single-threaded server?

If you use threads, your code can be interrupted at any place, except when 
you tell it not to (locking). If you use deferreds, your code can be 
interrupted only at exactly those places you have indicated. This makes it 
much easier to write correct code.

> i figured it makes it more scalable because there's too much overhead to
> have a thread for each user when you have many simultaneous users.  but a
> friend i'm talking to now says that using i/o blocking threads is
> perfectly scalable for a large number of simultaneous users.

It depends on a lot of things. For example, would you use thread pools or 
one thread per user? And how many users are you talking about? 100, 1000, 
10000, ...?

It also depends on how efficient your OS is in handling threads. I remember 
having to compile a differently configured Linux kernel because it would 
run out of processes when creating about 1000 threads, but this was over 5 
years ago, before NPTL, so this may no longer be an issue.

If you have a multi core / multi CPU machine, running multiple threads could 
spread the workload over different cores. For Python this doesn't really 
help though: the Python VM has the Global Interpreter Lock, which 
effectively means that unless you implement long-running operations in an 
extension written in C, there will only be one thread making progress at 
any time. So if you want to use multiple cores effectively in Python, you 
have to design your application to consist of separate communicating 
processes.

> if that's true i can only see a disadvantage in using a single-threaded
> server -- having to use deferreds and stuff to make things asynchronous

The main advantage in my opinion is that it is much easier to write correct 
asynchronous code than correct threaded code. If you write threaded code 
and overlook one place it can be interrupted, you have a bug. If you write 
asynchronous code and overlook one place it should be interruptable, you 
get worse latency, but it is still correct.

Because the points at which different tasks are interleaved are much more 
predictable in asynchronous code, there is a reasonable chance that if your 
code passes your unit tests, it is actually correct. For threaded code, 
it's not uncommon that code passes its unit tests, but starts giving wrong 
results as soon as the server is put under high load.

It may sound strange that I'm saying asynchronous code is easier to write, 
since that is probably not the experience you have when you start doing it. 
But if you're writing a complex threaded application, you typically end up 
assigning each thread its own area of responsibility and getting its inputs 
and outputs from other threads using event queues. If you don't do this, 
threads will run through your application in unpredictable ways as the 
application grows in complexity and even assuming you have proper locking 
over all shared data, you can run into deadlocks if you don't always lock 
things in the same order (thread 1 locks A and then B, thread 2 locks B and 
then A -> possible deadlock).

So you end up with a threaded application design where each thread runs in 
an isolated pocket, getting data from an event queue, processing it and 
then inserting it in another event queue. This is not all that different 
from the asynchronous situation in which you get an event from a reactor 
callback, do some processing and then register another callback.

As an aside, I think one of the problems with threads is that to write a 
piece of code correctly, you have to take into account which threads exist 
in your application. This means it is no longer possible to know whether 
for example a class is correct by looking at it in isolation. One of the 
advantages of object oriented programming is that you only have to care 
about whether a class correctly implements its interface, not how that 
class is used in an application. But when threading, this is no longer the 
case: a class that is correct in single threaded use can be incorrect in 
multi threaded use and a class that is correct in multi threaded use in one 
application can cause a deadlock in another application.

> i also don't understand how you're supposed to use deferreds
> the twisted doc says deferreds won't *make* your code asynchronous.  so
> let's say you have to do an sql query that takes 10 seconds, deferreds
> would be useless for making that not block unless you have a way of
> making that sql query non-blocking already?  how is that done?  do you
> run a separate thread of your own for each sql query?  one thread for all
> sql queries?

If there is an asynchronous API for doing a particular type of I/O, use 
that. If there isn't, you have to use a thread like you describe and use 
one of the thread safe reactor calls to pass the result.

My gut feeling tells me to use a thread pool, possibly of size 1, to access 
for example a database. But I haven't written code like this, so I have no 
experience to back this up. Every kind of I/O I wanted to do so far was 
already handled by Twisted. In the case of databases, use "adbapi".

> also I wonder in an typical twisted app, just how slow should an
> operation be before you use a deferred?  what if a user enters a username
> and password and i have to look that up in the database. do i use a
> deferred?  just how bad should the query be before using a deferred?

It depends on the kind of database. If you have an in-memory database, you 
don't need a deferred. If you have a simple text file on a local disk, you 
probably don't need a deferred. If you contact a DB server on the same 
machine, you might get away with not using a deferred, but it would be 
better to use one. If you contact a DB server on a different machine, 
definately use a deferred.

One simple check is to imagine what would happen if the DB is not available. 
If you use an in-memory DB, it will always be available. If you use a 
simple text file on a local disk, you will immediately get an error if 
opening it fails. If you contact a DB server, it is possible you get a 
timeout when connecting to it. Since server timeouts are typically in the 
order of seconds, this is not something you'd want to block your entire 
application on, so use a deferred.

In any case, Twisted offers "cred" as an authentication framework and cred 
always uses a deferred to give you the results of a credentials check. This 
is good because now you can easily switch from one type of credentials 
checker to another without changing the code that uses it.

> (reading the twisted docs is like reading a brick wall for me, it would
> be nice if someone could just explain things to me in simple terms.)

I think one of the problems is that many people who get started with Twisted 
are learning both asynchronous programming and Twisted at the same time, so 
there are a lot of new concepts to learn.

Bye,
		Maarten
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 194 bytes
Desc: This is a digitally signed message part.
Url : http://twistedmatrix.com/pipermail/twisted-web/attachments/20080426/645bf2af/attachment.pgp