Ticket #3943: defer2.2.xhtml

File defer2.2.xhtml, 10.8 KB (added by ezyang, 5 years ago)

Revision two, part way through part 2

Line 
1<?xml version="1.0"?>
2<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
3    "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
4
5<html xmlns="http://www.w3.org/1999/xhtml">
6<head>
7<title>Guide to twisted.internet.defer</title>
8</head>
9<body>
10
11<h1>Guide to twisted.internet.defer</h1>
12
13<p>This document discusses asynchronous programming and Twisted's
14implementation of Deferred objects in
15<code class="API">twisted.internet.defer.Deferred</code>.  The
16first section is a tutorial-ish primer on asynchronous programming;
17the later sections are more references of features, from basic
18to advanced.</p>
19
20<p>Describing how to <em>make</em> code asynchronous and how
21to <em>use</em> asynchronous code are very different; the former
22requires you to know about polling or threading or other parallel
23execution mechanisms, whereas the latter does not.  This document
24covers the latter.</p>
25
26<h2>Synchronous to Asynchronous, the Method to the Madness</h2>
27
28<p>Most of the programs you write in Python are <strong>synchronous</strong>
29programs.  You write down a list of instructions in your source code,
30and Python executes them in step.</p>
31
32<pre class="python">
33contents = get_web_page()
34dom = parse_web_page(contents)
35save_element(dom.getElementById('interesting-field'))
36</pre>
37
38<p>Straightforward.  Python gets the web page, and then parses the
39page, and then saves the page somewhere else.  If your network is
40reasonably fast, this will also be pretty quick set of three steps
41to go through.</p>
42
43<p>But what if <code>get_web_page()</code> was slow; on order of
44several seconds?  What if you were getting lots of resources from
45the web&mdash;your browser certainly doesn't wait for each image on
46a page to finish loading before loading the next one.  You don't
47want to <em>wait</em>.</p>
48
49<p>Through the magic of <strong>asynchronous</strong> programming,
50you don't have to wait.  Instead, you say to Python: "I would like
51this webpage to be downloaded on the web, but I'm not going to
52wait for you to do it."  Code
53wise, this is equivalent to making a call to an asynchronous function (i.e.
54<code>promise_to_get_web_page()</code>).</p>
55
56<pre class="python">
57promise_to_get_web_page()
58# dom = parse_web_page(???)
59# save_element(dom.getElementById('interesting-field'))
60</pre>
61
62<p>Notice, however, that the code that followed <code>get_web_page()</code>,
63the parsing and saving code, is now in a tough spot.  We didn't wait
64for the result, so we don't <em>have</em> a result.  At some point,
65Python will have the result, but you have no way of knowing that;
66for all you know, you could be many files away in another module
67calculating digits of pi.</p>
68
69<p>Let's rephrase our request to Python. "I would like this webpage
70to be downloaded on the web, but I'm not going to wait for you to do
71it. <em>When you finish, please run this set of code with the result.</em>"
72In order to make the "this set of code" something you can pass around,
73you'll need to put it in a function.</p>
74
75<pre class="python">
76promise_to_get_web_page()
77def what_to_do_after_you_got_it(result):
78    dom = parse_web_page(result)
79    save_element(dom.getElementById('interesting-field'))
80</pre>
81
82<p>There is one last question: how do we tell Python that
83<code>what_to_do_after_you_got_it()</code> is what we want to be
84called when we're done getting the web page.  There <em>could</em> have
85been some magic keyword argument that you passed to the asynchronous
86argument to be your callback.  Twisted, however, uses a much
87more flexible and standardized system: <strong>Deferred</strong> objects.</p>
88
89<pre class="python">
90defer = promise_to_get_web_page()
91def what_to_do_after_you_got_it(result):
92    dom = parse_web_page(result)
93    save_element(dom.getElementById('interesting-field'))
94defer.addCallback(what_to_do_after_you_got_it)
95</pre>
96
97<p>The variable <code>defer</code> is a Deferred object, a representation of the promise;
98it doesn't actually contain the web page.  We then add extra behavior
99to the promise with <code>addCallback</code>, saying "When you
100get the web page, <em>call back</em> this function with the result."</p>
101
102<p>Asynchronous programming is centered around this notion that:</p>
103
104<ul>
105    <li>Some function calls are expensive, so <em>don't wait for them</em></li>
106    <li><em>Don't call; Twisted will call you.</em> When they are done, give me the results by <em>calling back</em> a function of my choice.  Return values of asynchronous functions are Deferred objects, which I use to register these callbacks.</li>
107    <li>Sometimes I want code to happen during an event, but the event firing is distinct from my program flow (time-based or external stimulus based).  When it does happen, <em>call back</em> a function of my choice.</li>
108</ul>
109
110<p>This stands in contrast to synchronous programming, where:</p>
111
112<ul>
113    <li>Function calls are cheap enough, so we can wait for them to finish</li>
114    <li><em>You make the calls.</em> If I want a sub-result, I call a function and use its return value</li>
115    <li>Events? What are events?</li>
116</ul>
117
118<p>Expensive functions that deal with input/output will
119commonly have a synchronous version (found in the Python standard
120library) and an asynchronous version (found in Twisted).  You can tell
121if a function is asynchronous if it returns a <code>Deferred</code>
122object. Functions that are asynchronous include:</p>
123
124<ul>
125    <li>Communication over the network</li>
126    <li>Interprocess communication</li>
127    <li>User interfaces</li>
128    <li>To a lesser extent, hard drive and database access</li>
129</ul>
130
131<p>Any code that uses the
132synchronous version of a function can be converted to use the asynchronous version.  The goal
133of this document is to show you how.</p>
134
135<h2>Deferred</h2>
136
137<h3>Basic operation</h3>
138
139<p>At its very simplest, the Deferred has a single callback attached to it, which
140gets invoked with the result as an argument when it becomes available:</p>
141
142<table class="compare">
143    <tr>
144        <th>Synchronous</th>
145        <th>Asynchronous</th>
146    </tr>
147    <tr>
148        <td><pre class="python">
149value = synchronous_operation()
150process(value)
151        </pre></td>
152        <td><pre class="python">
153defer = asynchronous_operation()
154defer.addCallback(process)
155        </pre></td>
156    </tr>
157</table>
158
159<p>This corresponds to a very simple deferred model:</p>
160
161<pre class="python">
162class Deferred:
163    """A bare bones deferred implementation (take with a grain of salt)."""
164    def __init__(self):
165        self.f = None
166    def addCallback(self, f):
167        self.f = f
168    def callback(self, result):
169        self.f(result)
170</pre>
171
172<p>The asynchronous code calls <code>callback()</code> when it has a result.
173Notice that there is no asynchronous magic involving threads, forks or
174polling in this model: deferred is <em>not</em> magical.  Deferred isn't actually
175this simple, but even as we add on more complexity none of the magic will
176creep in.</p>
177
178<h3>Errbacks</h3>
179
180<p>Error handling is an ever present concern in synchronous code.  Deferred
181implements a system of <strong>errbacks</strong> in order to simulate Python
182try/except blocks.  Just like in synchronous code, you <em>always</em> should
183register an errback in order to deal with an error gracefully.</p>
184
185<table class="compare">
186    <tr>
187        <th>Synchronous</th>
188        <th>Asynchronous</th>
189    </tr>
190    <tr>
191        <td><pre class="python">
192try:
193    synchronous_operation()
194except UserError as e:
195    handle_error(e)
196        </pre></td>
197        <td><pre class="python">
198def handle_twisted_error(failure):
199    e = failure.trap(UserError)
200    handle_error(e)
201defer = asynchronous_operation()
202defer.addErrback(handle_twisted_error)
203        </pre></td>
204    </tr>
205</table>
206
207<p>There are plenty of things going on here:</p>
208
209<ul>
210    <li>Instead of being passed an exception object, which is roughly
211    analogous to the result in the no error case, you are passed a
212    <code>twisted.python.failure.Failure</code> object.  This is roughly
213    a wrapper around the standard <code>Exception</code> with a few
214    crucial enhancements to make it useful in an asynchronous context.</li>
215
216    <li>Consequently, we pull out the real exception by using
217    <code>failure.trap(UserError)</code>.  This is the userland implementation
218    of <code>except</code>; if the exception is not trapped, it gets
219    re-thrown and our errback is bypassed.  <!-- You wouldn't actually write Python
220    code that looked like this, but this is a more faithful rendition of
221    what is happening:
222    <pre class="python">
223try:
224    synchronous_operation()
225except:
226    e = sys.exc_info()[1] # get the exception
227    # trap the exception
228    if not isinstance(e, UserError):
229        raise e
230    handle_error(e)
231    </pre> --></li>
232
233    <li>You can trap multiple types of exceptions by simply calling trap
234    with multiple arguments, e.g. <code>failure.trap(UserError, OtherUserError)</code></li>
235</ul>
236
237<p>Omitting the trap declaration is equivalent to a catch-all
238except block:</p>
239
240<table class="compare">
241    <tr>
242        <th>Synchronous</th>
243        <th>Asynchronous</th>
244    </tr>
245    <tr>
246        <td><pre class="python">
247try:
248    synchronous_operation()
249except:
250    handle_error()
251    raise
252        </pre></td>
253        <td><pre class="python">
254def handle_twisted_error(failure):
255    handle_error()
256    return failure
257defer = asynchronous_operation()
258defer.addErrback(handle_twisted_error)
259        </pre></td>
260    </tr>
261</table>
262
263<p>Notice that in order to re-raise the exception, we simply
264return it from our errback handler.  Deferred will notice that it
265is the type of a failure object, and act accordingly.  In fact,
266you can also manually rethrow the exception in <code>failure.value</code>
267and Deferred will do the right thing:</p>
268
269<pre class="python">
270def handle_twisted_error(failure):
271    handle_error(failure.value)
272    raise failure.value
273defer = asynchronous_operation()
274defer.addErrback(handle_twisted_error)
275</pre>
276
277<p>Word to the wise: if you want asynchronous code that simulates
278multiple trailing except blocks, you'll have to implement it manually.
279Twisted has no built-in facilities for this.</p>
280
281<h3>Putting it together</h3>
282
283<p>In most cases, you'll want to perform some processing on the deferred
284result <em>as well</em> as have error handling.</p>
285<!--
286<table class="compare">
287    <tr>
288        <th>Synchronous</th>
289        <th>Asynchronous</th>
290    </tr>
291    <tr>
292        <td><pre class="python">
293try:
294    value = synchronous_operation()
295    process(value)
296except UserError as e:
297    handle_error(e)
298        </pre></td>
299        <td><pre class="python">
300        </pre></td>
301    </tr>
302</table>
303
304<table class="compare">
305    <tr>
306        <th>Synchronous</th>
307        <th>Asynchronous</th>
308    </tr>
309    <tr>
310        <td><pre class="python">
311try:
312    value = synchronous_operation()
313except UserError as e:
314    handle_error(e)
315if value is not None:
316    process(value)
317        </pre></td>
318        <td><pre class="python">
319        </pre></td>
320    </tr>
321</table>
322-->
323</body>
324</html>