Ticket #1022: async.EAS.xhtml

File async.EAS.xhtml, 16.0 KB (added by edsuom, 16 years ago)

Updated version of "Asynchronous Programming with Twisted"

Line 
1<?xml version="1.0"?>
2<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
3    "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
4
5<html xmlns="http://www.w3.org/1999/xhtml">
6
7<head><title>Asynchronous Programming with Twisted</title></head>
8<body>
9
10<h1>Asynchronous Programming with Twisted</h1>
11
12<p>This document is a introduction to the asynchronous programming model, and
13to Twisted's Deferred abstraction, which symbolises a 'promised' result and
14which can pass an eventual result to handler functions.</p>
15
16<p>This document is for readers new to Twisted who are familiar with the
17Python programming language and, at least conceptually, with core networking
18conepts such as servers, clients and sockets. This document will give you a
19high level overview of concurrent programming (interleaving several tasks) and
20of Twisted's concurrency model: <strong>non-blocking code</strong> or
21<strong>asynchronous code</strong>.</p>
22
23<p>After discussing the concurrency model of which Deferreds are a part, it
24will introduce the methods of handling results when a function returns a
25Deferred object.</p>
26
27<h2>Introduction to concurrent programming</h2>
28
29<p> All computing tasks take some time to complete, but for the most part that
30time is not noticeable to either a user or another process that is waiting for
31some result. There are two reasons why a task might take enough time for the
32delay to be an issue: </p>
33
34<ol>
35<li>it is computationally intensive (for example factorising large numbers)
36and requires a certain amount of CPU time to calculate the answer; or</li>
37<li>it is not computationally intensive but has to wait for data that
38is not immediately available before it can produce a result.</li>
39</ol>
40
41<h3>Waiting for answers</h3>
42
43<p>A fundamental requirement of network programming is that you must have a way
44of waiting for data.  (That's largely true of database programming, too.)
45Imagine you have a function that sends an email summarising some information.
46This function needs to connect to a remote server, wait for the remote server
47to reply, check that the remote server can process the email, wait for the
48reply, send the email, wait for the confirmation, and then disconnect.</p>
49
50<p>Any one of these steps may take a long period of time. Your program might
51use the simplest of all possible models, in which it actually sits and waits
52for data to be sent and received, but in this case it has some very obvious
53and basic limitations: it can't send many emails at once; and in fact it can't
54do anything else while it is sending an email.</p>
55
56<p>Hence, all but the simplest network programs avoid this model. You can use
57one of several different models to allow your program to keep doing whatever
58tasks it has on hand while it is waiting for something to happen before a
59particular task can continue.</p>
60
61<h3>Not waiting on data</h3>
62
63<p>There are many ways to write network programs.  The main ones are:</p>
64
65<ol>
66    <li>handle each connection in a separate operating system process, in
67    which case the operating system will take care of letting other processes
68    run while one is waiting;</li>
69    <li>handle each connection in a separate thread<span class="footnote">There
70    are variations on this method, such
71    as a limited-size pool of threads servicing all connections, which are
72    essentially just optimizations of the same idea.</span> in which the
73    threading framework takes care of letting other threads run while one is
74    waiting; or</li>
75    <li>use non-blocking system calls to handle all connections
76        in one thread.</li>
77</ol>
78
79<h3>Non-blocking calls</h3>
80
81<p>The normal model when using the Twisted framework is the third model:
82non-blocking calls.</p>
83
84<p>When dealing with many connections in one thread, the scheduling is the
85responsibility of the application, not the operating system, and is usually
86implemented by calling a registered function when each connection is ready to
87for reading or writing -- commonly known as <strong>asynchronous</strong>,
88<strong>event-driven</strong> or <strong>callback-based</strong>
89programming.</p>
90
91<p>In this model, the earlier email sending function would work something
92like this:</p>
93
94<ol>
95  <li>it calls a connection function to connect to the remote server;</li>
96  <li>the connection function returns immediately, with the implication that
97  the notify the email sending library will be called when the connect has
98  been made; and</li>
99  <li>once the connection is made, the connect mechanism notifies the email
100  sending function that the connection is ready.</li>
101</ol>
102
103<p>What advantage does the above sequence have over our original blocking
104sequence? The advantage is that while the email sending function can't do the
105next part of its job until the connection is open, the rest of the program can
106do other tasks, like begin the opening sequence for other email connections.
107Hence, the entire program is not waiting for the connection.</p>
108
109<h3>Callbacks</h3>
110
111<p>In <i>synchonous programming</i>, a function requests data, sits around and
112waits for the data, and finally gets moving again when the data has been
113produced. With <i>asynchronous programming</i>, your code merely
114<strong>initiates</strong> a request for some data and then gets to
115<strong>delegate</strong> the responsibility for dealing with that data (when
116it's finally ready) to some separate <em>callback</em> function.</p>
117
118<p>When an asynchronous application calls a data-producing function, it
119supplies that function with a reference to a separate callback function for the
120data-producing function to call when the data is finally ready to return. The
121data-producing function does <i>not</i> return the data to the original
122caller. Instead, it supplies that data as an argument to the callback function
123when it makes the promised call to it. The original caller has delegated the
124callback function with the responsibility of dealing with the data and
125continuing whatever processing the caller had in mind for the data once it has
126been produced.</p>
127
128<a name="deferreds" />
129<h2>Deferreds</h2>
130
131<p>Twisted uses the <code class="API"
132base="twisted.internet.defer">Deferred</code> object as a manager for your
133asynchronous callback sequence. Your client application attaches to the
134deferred object a reference to some function to which it has delegated the
135responsibility for dealing with the results of the asychronous request once
136those results are available. Your application should also entrust some function
137(possibly the same one) with the responsibility for dealing with an error that
138results from its request instead of data. Such an
139<strong>err</strong>or-handling call<strong>back</strong> function is known as
140an <strong>errback</strong>.</p>
141
142<p>Your application can also attach a series of functions that process the results
143and pass their own results on to the next guy. Such a series is known as a
144<strong>callback chain</strong>), and should be used with another a series of
145functions that are called if there is an error in the asychronous request
146(known as a series of <strong>errbacks</strong> or an <strong>errback
147chain</strong>). The asychronous library code calls the first callback when the
148result is available, or the first errback when an error occurs, and the
149<code>Deferred</code> object then hands the results of each callback or errback
150function to the next function in the chain.</p>
151
152<h2>The Problem that Deferreds Solve</h2>
153
154<p> It is the second class of concurrency problem &mdash; non-computationally
155intensive tasks that involve an appreciable delay &mdash; that Deferreds are
156designed to help solve.  Functions that wait on hard drive access, database
157access, and network access all fall into this class, although the time delay
158varies.  </p>
159
160<p> Deferreds are designed to give Twisted programs a way to wait for data
161without hanging until that data arrives. They do this by providing a simple
162management interface libraries and applications to delegate the responsibility
163for dealing with possibly delayed data and errors to callbacks and errbacks,
164respectively. By returning a Deferred object, your Twisted-based library code
165knows that it can always make its results available by calling <code
166class="API" base="twisted.internet.defer">Deferred.callback</code>. If an error
167crops up, your code will be able to have that unfortunate situation dealt with
168by calling <code class="API"
169base="twisted.internet.defer">Deferred.errback</code>
170instead. Twisted-compatible applications have to do their part, however. They
171must set up the results handlers for the Deferred object by attaching to it the
172callbacks and errbacks they want called with results, in the order they want
173them called.</p>
174
175<p> The basic idea behind Deferreds, and other asynchronous solutions as well,
176is to keep the CPU as active as possible.  If one task is waiting on data,
177rather than have the CPU (and the program!) idle waiting for that data (a
178process normally called &quot;blocking&quot;), the program performs other
179operations in the meantime, confident that its callback and errback will deal
180with the data once it is ready to be processed. In Twisted, a function signals
181to the calling function that there is no immediate result by returning a
182Deferred as its result. When the data is available, the program activates the
183callbacks on that Deferred to process the data in sequence.</p>
184
185<h2>Deferreds - a signal that data is yet to come</h2>
186
187<p>In our email sending example above, a parent function calls a function to
188connect to the remote server. Asynchrony requires that this connection
189function return <em>without waiting for the result</em> so that the parent
190function can do other things. So how does the parent function or its
191controlling program know that the connection doesn't exist yet, and how does
192it use the connection once it does exist?</p>
193
194<p>What Twisted uses to signal this situation is, of course, our versatile
195<code class="API">twisted.internet.defer.Deferred</code> object. When the
196connection function returns, it signals that the operation is incomplete by
197returning a Deferred rather than the actual handle to the connection.</p>
198
199<p>The Deferred has two purposes. The first is that it says &quot;I am a
200signal that the result of whatever you wanted me to do is still pending.&quot;
201The second is that you can ask the Deferred to run things when the data
202does arrive.</p>
203
204<p>You can picture a function that returns a Deferred as acting like a
205librarian who responds to a patron's question ("Are these mushrooms
206poisonous?") with a handwritten note saying, "I don't have your answer off the
207top of my head, but let me know where I can call you with the answer when I
208have it."  The caller to the function does the equivalent of the patron
209scribbling a phone number on the note by attaching a callback to the
210Deferred. An equivalent of a deferred chain is where the patron writes several
211numbers on the note, and the person answering the phone at the first number
212responds to the answer ("They're highly poisonous") with another answer ("It's
213too late to use another recipe, our dinner party is canceled") that the
214librarian relays to whomever answers the phone (some disappointed dinner guest,
215perhaps) at the second number. Note that the library patron has been able to
216wander off and forget all about this matter in the meantime; such is the beauty
217of asynchronous programming!</p>
218
219<h3>Callbacks</h3>
220
221<p>The way you tell a Deferred what to do with the data once it arrives is by
222adding a callback &mdash; asking the Deferred to call a function once the data
223arrives.</p>
224
225<p>One Twisted library function that returns a Deferred is <code
226class="API">twisted.web.client.getPage</code>. In this example, we call
227<code>getPage</code>, which returns a Deferred, and we attach a callback to
228handle the contents of the page once the data is available:</p>
229
230<pre class="python">
231from twisted.web.client import getPage
232
233from twisted.internet import reactor
234
235def printContents(contents):
236    '''
237    This is the 'callback' function, added to the Deferred and called by
238    it when the promised data is available
239    '''
240
241    print "The Deferred has called printContents with the following contents:"
242    print contents
243
244    # Stop the Twisted event handling system -- this is usually handled
245    # in higher level ways
246    reactor.stop()
247
248# call getPage, which returns immediately with a Deferred, promising to
249# pass the page contents onto our callbacks when the contents are available
250deferred = getPage('http://twistedmatrix.com/')
251
252# add a callback to the deferred -- request that it run printContents when
253# the page content has been downloaded
254deferred.addCallback(printContents)
255
256# Begin the Twisted event handling system to manage the process -- again this
257# isn't the usual way to do this
258reactor.run()
259</pre>
260
261<p>A very common use of Deferreds is to attach two callbacks. The result of the
262first callback is passed to the second callback:</p>
263
264<pre class="python">
265from twisted.web.client import getPage
266
267from twisted.internet import reactor
268
269def lowerCaseContents(contents):
270    '''
271    This is a 'callback' function, added to the Deferred and called by
272    it when the promised data is available. It converts all the data to
273    lower case
274    '''
275
276    return contents.lower()
277
278def printContents(contents):
279    '''
280    This a 'callback' function, added to the Deferred after lowerCaseContents
281    and called by it with the results of lowerCaseContents
282    '''
283
284    print contents
285    reactor.stop()
286
287deferred = getPage('http://twistedmatrix.com/')
288
289# add two callbacks to the deferred -- request that it run lowerCaseContents
290# when the page content has been downloaded, and then run printContents with
291# the result of lowerCaseContents
292deferred.addCallback(lowerCaseContents)
293deferred.addCallback(printContents)
294
295reactor.run()
296</pre>
297
298<h3>Error handling: errbacks</h3>
299
300<p>Just as an asynchronous function returns before its result is available, it
301may also return before it is possible to detect errors: failed connections,
302erroneous data, protocol errors, and so on. Just as you can add callbacks to a
303Deferred which it calls when the data you are expecting is available, you can
304add error handlers ('errbacks') to a Deferred for it to call when an error
305occurs and it cannot obtain the data:</p>
306
307<pre class="python">
308from twisted.web.client import getPage
309
310from twisted.internet import reactor
311
312def errorHandler(error):
313    '''
314    This is an 'errback' function, added to the Deferred which will call
315    it in the event of an error
316    '''
317
318    # this isn't a very effective handling of the error, we just print it out:
319    print "An error has occurred: &lt;%s&gt;" % str(error)
320    # and then we stop the entire process:
321    reactor.stop()
322
323def printContents(contents):
324    '''
325    This a 'callback' function, added to the Deferred and called by it with
326    the page content
327    '''
328
329    print contents
330    reactor.stop()
331
332# We request a page which doesn't exist in order to demonstrate the
333# error chain
334deferred = getPage('http://twistedmatrix.com/does-not-exist')
335
336# add the callback to the Deferred to handle the page content
337deferred.addCallback(printContents)
338
339# add the errback to the Deferred to handle any errors
340deferred.addErrback(errorHandler)
341
342reactor.run()
343</pre>
344
345<h2>Conclusion</h2>
346
347<p>In this document, you have:</p>
348
349<ol>
350<li>seen why non-trivial network programs need to have some form of concurrency;</li>
351<li>learnt that the Twisted framework supports concurrency in the form of
352asynchronous calls;</li>
353<li>learnt that the Twisted framework has Deferred objects that manage callback
354chains;</li>
355<li>seen how the <code class="API" base="twisted.web.client">getPage</code>
356function returns a Deferred object;</li>
357<li>attached callbacks and errbacks to that Deferred; and</li>
358<li>seen the Deferred's callback chain and errback chain fire.</li>
359</ol>
360
361<h3>See also</h3>
362
363<p>Since the Deferred abstraction is such a core part of programming with
364Twisted, there are several other detailed guides to it:</p>
365
366<ol>
367<li><a href="defer.xhtml">Using Deferreds</a>, a more complete guide to
368using Deferreds, including Deferred chaining.</li>
369<li><a href="gendefer.xhtml">Generating Deferreds</a>, a guide to creating
370Deferreds and firing their callback chains.</li>
371</ol>
372
373</body></html>