This document is a introduction to the asynchronous programming model, and to Twisted's Deferred abstraction, which symbolises a 'promised' result and which can pass an eventual result to handler functions.
This document is for readers new to Twisted who are familiar with the Python programming language and, at least conceptually, with core networking conepts such as servers, clients and sockets. This document will give you a high level overview of concurrent programming (interleaving several tasks) and of Twisted's concurrency model: non-blocking code or asynchronous code.
After discussing the concurrency model of which Deferreds are a part, it will introduce the methods of handling results when a function returns a Deferred object.
All computing tasks take some time to complete, but for the most part that time is not noticeable to either a user or another process that is waiting for some result. There are two reasons why a task might take enough time for the delay to be an issue:
A fundamental requirement of network programming is that you must have a way of waiting for data. (That's largely true of database programming, too.) Imagine you have a function that sends an email summarising some information. This function needs to connect to a remote server, wait for the remote server to reply, check that the remote server can process the email, wait for the reply, send the email, wait for the confirmation, and then disconnect.
Any one of these steps may take a long period of time. Your program might use the simplest of all possible models, in which it actually sits and waits for data to be sent and received, but in this case it has some very obvious and basic limitations: it can't send many emails at once; and in fact it can't do anything else while it is sending an email.
Hence, all but the simplest network programs avoid this model. You can use one of several different models to allow your program to keep doing whatever tasks it has on hand while it is waiting for something to happen before a particular task can continue.
There are many ways to write network programs. The main ones are:
The normal model when using the Twisted framework is the third model: non-blocking calls.
When dealing with many connections in one thread, the scheduling is the responsibility of the application, not the operating system, and is usually implemented by calling a registered function when each connection is ready to for reading or writing -- commonly known as asynchronous, event-driven or callback-based programming.
In this model, the earlier email sending function would work something like this:
What advantage does the above sequence have over our original blocking sequence? The advantage is that while the email sending function can't do the next part of its job until the connection is open, the rest of the program can do other tasks, like begin the opening sequence for other email connections. Hence, the entire program is not waiting for the connection.
In synchonous programming, a function requests data, sits around and waits for the data, and finally gets moving again when the data has been produced. With asynchronous programming, your code merely initiates a request for some data and then gets to delegate the responsibility for dealing with that data (when it's finally ready) to some separate callback function.
When an asynchronous application calls a data-producing function, it supplies that function with a reference to a separate callback function for the data-producing function to call when the data is finally ready to return. The data-producing function does not return the data to the original caller. Instead, it supplies that data as an argument to the callback function when it makes the promised call to it. The original caller has delegated the callback function with the responsibility of dealing with the data and continuing whatever processing the caller had in mind for the data once it has been produced.
Twisted uses the Deferred
object as a manager for your
asynchronous callback sequence. Your client application attaches to the
deferred object a reference to some function to which it has delegated the
responsibility for dealing with the results of the asychronous request once
those results are available. Your application should also entrust some function
(possibly the same one) with the responsibility for dealing with an error that
results from its request instead of data. Such an
error-handling callback function is known as
an errback.
Your application can also attach a series of functions that process the results
and pass their own results on to the next guy. Such a series is known as a
callback chain), and should be used with another a series of
functions that are called if there is an error in the asychronous request
(known as a series of errbacks or an errback
chain). The asychronous library code calls the first callback when the
result is available, or the first errback when an error occurs, and the
Deferred
object then hands the results of each callback or errback
function to the next function in the chain.
It is the second class of concurrency problem — non-computationally intensive tasks that involve an appreciable delay — that Deferreds are designed to help solve. Functions that wait on hard drive access, database access, and network access all fall into this class, although the time delay varies.
Deferreds are designed to give Twisted programs a way to wait for data
without hanging until that data arrives. They do this by providing a simple
management interface libraries and applications to delegate the responsibility
for dealing with possibly delayed data and errors to callbacks and errbacks,
respectively. By returning a Deferred object, your Twisted-based library code
knows that it can always make its results available by calling Deferred.callback
. If an error
crops up, your code will be able to have that unfortunate situation dealt with
by calling Deferred.errback
instead. Twisted-compatible applications have to do their part, however. They
must set up the results handlers for the Deferred object by attaching to it the
callbacks and errbacks they want called with results, in the order they want
them called.
The basic idea behind Deferreds, and other asynchronous solutions as well, is to keep the CPU as active as possible. If one task is waiting on data, rather than have the CPU (and the program!) idle waiting for that data (a process normally called "blocking"), the program performs other operations in the meantime, confident that its callback and errback will deal with the data once it is ready to be processed. In Twisted, a function signals to the calling function that there is no immediate result by returning a Deferred as its result. When the data is available, the program activates the callbacks on that Deferred to process the data in sequence.
In our email sending example above, a parent function calls a function to connect to the remote server. Asynchrony requires that this connection function return without waiting for the result so that the parent function can do other things. So how does the parent function or its controlling program know that the connection doesn't exist yet, and how does it use the connection once it does exist?
What Twisted uses to signal this situation is, of course, our versatile
twisted.internet.defer.Deferred
object. When the
connection function returns, it signals that the operation is incomplete by
returning a Deferred rather than the actual handle to the connection.
The Deferred has two purposes. The first is that it says "I am a signal that the result of whatever you wanted me to do is still pending." The second is that you can ask the Deferred to run things when the data does arrive.
You can picture a function that returns a Deferred as acting like a librarian who responds to a patron's question ("Are these mushrooms poisonous?") with a handwritten note saying, "I don't have your answer off the top of my head, but let me know where I can call you with the answer when I have it." The caller to the function does the equivalent of the patron scribbling a phone number on the note by attaching a callback to the Deferred. An equivalent of a deferred chain is where the patron writes several numbers on the note, and the person answering the phone at the first number responds to the answer ("They're highly poisonous") with another answer ("It's too late to use another recipe, our dinner party is canceled") that the librarian relays to whomever answers the phone (some disappointed dinner guest, perhaps) at the second number. Note that the library patron has been able to wander off and forget all about this matter in the meantime; such is the beauty of asynchronous programming!
The way you tell a Deferred what to do with the data once it arrives is by adding a callback — asking the Deferred to call a function once the data arrives.
One Twisted library function that returns a Deferred is twisted.web.client.getPage
. In this example, we call
getPage
, which returns a Deferred, and we attach a callback to
handle the contents of the page once the data is available:
from twisted.web.client import getPage from twisted.internet import reactor def printContents(contents): ''' This is the 'callback' function, added to the Deferred and called by it when the promised data is available ''' print "The Deferred has called printContents with the following contents:" print contents # Stop the Twisted event handling system -- this is usually handled # in higher level ways reactor.stop() # call getPage, which returns immediately with a Deferred, promising to # pass the page contents onto our callbacks when the contents are available deferred = getPage('http://twistedmatrix.com/') # add a callback to the deferred -- request that it run printContents when # the page content has been downloaded deferred.addCallback(printContents) # Begin the Twisted event handling system to manage the process -- again this # isn't the usual way to do this reactor.run()
A very common use of Deferreds is to attach two callbacks. The result of the first callback is passed to the second callback:
from twisted.web.client import getPage from twisted.internet import reactor def lowerCaseContents(contents): ''' This is a 'callback' function, added to the Deferred and called by it when the promised data is available. It converts all the data to lower case ''' return contents.lower() def printContents(contents): ''' This a 'callback' function, added to the Deferred after lowerCaseContents and called by it with the results of lowerCaseContents ''' print contents reactor.stop() deferred = getPage('http://twistedmatrix.com/') # add two callbacks to the deferred -- request that it run lowerCaseContents # when the page content has been downloaded, and then run printContents with # the result of lowerCaseContents deferred.addCallback(lowerCaseContents) deferred.addCallback(printContents) reactor.run()
Just as an asynchronous function returns before its result is available, it may also return before it is possible to detect errors: failed connections, erroneous data, protocol errors, and so on. Just as you can add callbacks to a Deferred which it calls when the data you are expecting is available, you can add error handlers ('errbacks') to a Deferred for it to call when an error occurs and it cannot obtain the data:
from twisted.web.client import getPage from twisted.internet import reactor def errorHandler(error): ''' This is an 'errback' function, added to the Deferred which will call it in the event of an error ''' # this isn't a very effective handling of the error, we just print it out: print "An error has occurred: <%s>" % str(error) # and then we stop the entire process: reactor.stop() def printContents(contents): ''' This a 'callback' function, added to the Deferred and called by it with the page content ''' print contents reactor.stop() # We request a page which doesn't exist in order to demonstrate the # error chain deferred = getPage('http://twistedmatrix.com/does-not-exist') # add the callback to the Deferred to handle the page content deferred.addCallback(printContents) # add the errback to the Deferred to handle any errors deferred.addErrback(errorHandler) reactor.run()
In this document, you have:
getPage
function returns a Deferred object;Since the Deferred abstraction is such a core part of programming with Twisted, there are several other detailed guides to it: