[Twisted-web] what are the advantage of using a single-threaded
server? and when should we use deferreds?
Maarten ter Huurne
maarten at treewalker.org
Sat Apr 26 05:37:05 EDT 2008
On Saturday 26 April 2008, inhahe wrote:
> what is the advantage of using a single-threaded server?
If you use threads, your code can be interrupted at any place, except when
you tell it not to (locking). If you use deferreds, your code can be
interrupted only at exactly those places you have indicated. This makes it
much easier to write correct code.
> i figured it makes it more scalable because there's too much overhead to
> have a thread for each user when you have many simultaneous users. but a
> friend i'm talking to now says that using i/o blocking threads is
> perfectly scalable for a large number of simultaneous users.
It depends on a lot of things. For example, would you use thread pools or
one thread per user? And how many users are you talking about? 100, 1000,
10000, ...?
It also depends on how efficient your OS is in handling threads. I remember
having to compile a differently configured Linux kernel because it would
run out of processes when creating about 1000 threads, but this was over 5
years ago, before NPTL, so this may no longer be an issue.
If you have a multi core / multi CPU machine, running multiple threads could
spread the workload over different cores. For Python this doesn't really
help though: the Python VM has the Global Interpreter Lock, which
effectively means that unless you implement long-running operations in an
extension written in C, there will only be one thread making progress at
any time. So if you want to use multiple cores effectively in Python, you
have to design your application to consist of separate communicating
processes.
> if that's true i can only see a disadvantage in using a single-threaded
> server -- having to use deferreds and stuff to make things asynchronous
The main advantage in my opinion is that it is much easier to write correct
asynchronous code than correct threaded code. If you write threaded code
and overlook one place it can be interrupted, you have a bug. If you write
asynchronous code and overlook one place it should be interruptable, you
get worse latency, but it is still correct.
Because the points at which different tasks are interleaved are much more
predictable in asynchronous code, there is a reasonable chance that if your
code passes your unit tests, it is actually correct. For threaded code,
it's not uncommon that code passes its unit tests, but starts giving wrong
results as soon as the server is put under high load.
It may sound strange that I'm saying asynchronous code is easier to write,
since that is probably not the experience you have when you start doing it.
But if you're writing a complex threaded application, you typically end up
assigning each thread its own area of responsibility and getting its inputs
and outputs from other threads using event queues. If you don't do this,
threads will run through your application in unpredictable ways as the
application grows in complexity and even assuming you have proper locking
over all shared data, you can run into deadlocks if you don't always lock
things in the same order (thread 1 locks A and then B, thread 2 locks B and
then A -> possible deadlock).
So you end up with a threaded application design where each thread runs in
an isolated pocket, getting data from an event queue, processing it and
then inserting it in another event queue. This is not all that different
from the asynchronous situation in which you get an event from a reactor
callback, do some processing and then register another callback.
As an aside, I think one of the problems with threads is that to write a
piece of code correctly, you have to take into account which threads exist
in your application. This means it is no longer possible to know whether
for example a class is correct by looking at it in isolation. One of the
advantages of object oriented programming is that you only have to care
about whether a class correctly implements its interface, not how that
class is used in an application. But when threading, this is no longer the
case: a class that is correct in single threaded use can be incorrect in
multi threaded use and a class that is correct in multi threaded use in one
application can cause a deadlock in another application.
> i also don't understand how you're supposed to use deferreds
> the twisted doc says deferreds won't *make* your code asynchronous. so
> let's say you have to do an sql query that takes 10 seconds, deferreds
> would be useless for making that not block unless you have a way of
> making that sql query non-blocking already? how is that done? do you
> run a separate thread of your own for each sql query? one thread for all
> sql queries?
If there is an asynchronous API for doing a particular type of I/O, use
that. If there isn't, you have to use a thread like you describe and use
one of the thread safe reactor calls to pass the result.
My gut feeling tells me to use a thread pool, possibly of size 1, to access
for example a database. But I haven't written code like this, so I have no
experience to back this up. Every kind of I/O I wanted to do so far was
already handled by Twisted. In the case of databases, use "adbapi".
> also I wonder in an typical twisted app, just how slow should an
> operation be before you use a deferred? what if a user enters a username
> and password and i have to look that up in the database. do i use a
> deferred? just how bad should the query be before using a deferred?
It depends on the kind of database. If you have an in-memory database, you
don't need a deferred. If you have a simple text file on a local disk, you
probably don't need a deferred. If you contact a DB server on the same
machine, you might get away with not using a deferred, but it would be
better to use one. If you contact a DB server on a different machine,
definately use a deferred.
One simple check is to imagine what would happen if the DB is not available.
If you use an in-memory DB, it will always be available. If you use a
simple text file on a local disk, you will immediately get an error if
opening it fails. If you contact a DB server, it is possible you get a
timeout when connecting to it. Since server timeouts are typically in the
order of seconds, this is not something you'd want to block your entire
application on, so use a deferred.
In any case, Twisted offers "cred" as an authentication framework and cred
always uses a deferred to give you the results of a credentials check. This
is good because now you can easily switch from one type of credentials
checker to another without changing the code that uses it.
> (reading the twisted docs is like reading a brick wall for me, it would
> be nice if someone could just explain things to me in simple terms.)
I think one of the problems is that many people who get started with Twisted
are learning both asynchronous programming and Twisted at the same time, so
there are a lot of new concepts to learn.
Bye,
Maarten
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 194 bytes
Desc: This is a digitally signed message part.
Url : http://twistedmatrix.com/pipermail/twisted-web/attachments/20080426/645bf2af/attachment.pgp
More information about the Twisted-web
mailing list