[Twisted-Python] Deferred documentation rewrite

Wed Aug 5 22:07:25 MDT 2009

On Mon, Aug 3, 2009 at 6:00 PM, Edward Z. Yang <ezyang at mit.edu> wrote:

> I have updated my draft here:
>
>    http://ezyang.com/twisted/defer2.html
>

Thanks.  Looks like it's improving.  I've got more points to critique now,
but that's only because there's more meat to the tutorial now :).

   1. The coding standard in this document is PEP8, not the Twisted coding
   standard.  Have a look here:
   http://twistedmatrix.com/trac/browser/trunk/doc/core/development/policy/coding-standard.xhtml?format=raw
   2. "Callbacks are the lingua franca of asynchronous programming" strikes
   me as an odd turn of phrase, especially one to open the document with.
   3. "This document addresses ... Deferred.  It..." - "It" has an ambiguous
   antecedent.  Are you talking about the document or the Deferred class?  Of
   course it becomes obvious, but it should be phrased so you don't need to.
   4. It's far from obvious what "nonblocking_call" is supposed to be, given
   that its definition is "pass".  On my first skim through I thought it was a
   callback, then had to stop, go back and read again when I realized that
   didn't make sense.  Brevity is good in examples, but this is too brief.
      1. "input" is a builtin function.  You might want to avoid using it
      for a parameter name.
      5. "You might be tempted to define it like this": you're switching
   back and forth from second to third person; at first referring to the
   reader, then an anonymous different programmer.  It might be useful to give
   these roles different names; "Alice and Bob" are popular.
   6. If you must use a third-person pronoun (as you do the one time you
   refer to the API's anonymous user); you should stick to a gender-neutral one
   wherever possible, unless of course you're referring to a specific
   character.
   7. "The Deferred doesn't do anything that you couldn't have done with the
   two callback parameters."  This isn't strictly true; chaining callbacks, and
   dealing with errors that arise in different layers of an asynchronous
   callback chain, aren't strictly possible without some additional mechanism.
   8. Deferred is mentioned as an API link, Failure isn't.
   9. Your explanations of the examples seem backwards.  "At its very
   simplest, Deferred has a single callback attached to it".  I think you
   should be explaining the problem being solved by a single callback, since
   the synchronous example isn't addressed.  The synchronous example obviously
   doesn't have a single callback attached to it :).  In other words, document
   "here's what you might want to do, here's how you can do it" rather than
   "here's a thing you can do!  by the way, you might want to do it
   because...".  You've addressed the general why-you-want-to-do-it in the
   section above, but it would be helpful to do it in the small for each
   specific example.
   10. The DeferredList docs seem wonky in several ways.
      1. The opening is hard to follow.

>       We are now ready to consider our original problem

      what original problem?

>       a Deferred that would only fire

      "fire"?  what does "fire" mean?  The term hasn't yet been introduced.

>       after some other number of Deferreds fired

      Yeah, I'm still not sure what you're referring to.  Why would I want
      to do this, again?
      2. Users really shouldn't be subclassing Deferred themselves, so it's
      bad to have an example that does that.  Especially one which .  The fact
      that this is what DeferredList is is an implementation detail,
and an ugly
      one at that.  Try talking about gatherResults instead, and implementing a
      function which does the same thing without a subclass.  Or,
perhaps, a class
      of your own which just delegates to Deferred for Deferred
behavior, rather
      than inheriting it.
      3. Users *definitely* shouldn't be subclassing Deferred without
      upcalling to its __init__.  I haven't tested them, but I'm
pretty sure these
      examples will just blow up with tracebacks.
      4. The examples are never invoked.  It's semi-obvious how to use them,
      but semi-obvious things are often invoked semi-correctly.  Better to have
      examples with can be run, or at least ones with a 0-argument entry point
      named something like 'start'.
      5. "Consider the following interaction of two Deferreds:".  You're
      setting this up as if it's going to be very formal, but then
your language
      is sloppy; you don't name the different deferreds.  One of them is "one
      deferred", the other is "a Deferred".  You don't describe them
      independently, the relationship is implicit in the description.
Given that
      you're describing a fairly complex constellation of objects with
which the
      user isn't necessarily familiar yet, you should be clearly labeling the
      Deferreds in question in the code sample with variable names
(something as
      simple as "a" and "b" would probably do fine) and then consistently using
      those names to refer to them in the prose as well, so it's easy for the
      reader to follow exactly which thing you're talking about.  A big problem
      with technical documentation, *especially* documentation of Deferreds,
      is that it's very easy for a reader to start confusing which thing is
      which.  Once again, it would be good to set up some kind of
concrete problem
      first: *why* are we waiting on multiple Deferreds?
      11. "Fluent Interface"?  This is more new terminology — terminology
   that I am not familiar with, I might add — that isn't defined anywhere in
   the document.  I think it's more of an appendix than something important to
   the main narrative; composing Deferreds, returning a Deferred from another
   Deferred, firing a Deferred from another callback, etc, should be covered
   first.
   1. "Batons" looks like it's going to be more fancy ad-hoc terminology - I
      would recommend keeping the language simpler and consistent with other
      Twisted documentation :).
   12. Still a lot of enumerated lists.  Obviously a bad habit to which I am
   prone ;-), but when one uses an enumerated list, there should either be in
   an expectation that the numbers will be useful.  Either, as in this document
   review, or code reviews, where the numbers can be used to refer to points in
   subsequent discussion, or there's a clear separation of steps.  It's not
   really clear what the "two possible scenarios" lists are enumerations *of
   *.  Are they different things that can happen?
   13. You should try eliminating the word "consider" from the document.
   You seem to have the rhetorical habit, which I've seen from other people
   (myself included), of having a sentence which is missing a clear
   subject/verb/object relationship, and working around it by saying "consider"
   or "let's say".  For example, you want to communicate that there's a
   Deferred somewhere with some callbacks.  You can't just say "A Deferred with
   some callbacks.", so you say "Consider: a Deferred with some callbacks", and
   now the sentence *seems* complete, but it doesn't really communicate a
   full thought.

Okay, I think that's enough feedback for now.  I'll have to do more with
your next round of edits, or my feedback is going to be longer than the
document itself :).

* Why asynchronous?
>    - Define synchronous and asynchronous
>    - Multiplexing IO
>    - Introduce a simple reactor based on select()
> * Why callbacks?

You might want to start with this one, since callbacks are even more
generally useful than asynchronous programming.  Your suggestion of a parser
example makes this clear: even if you're parsing synchronously, you'll still
probably have callbacks for different parse rule matches.

>    - Asynchronous interaction to synchronous interaction
>    - Delocalized execution (the parser example)
>    - High level functions in Python review
>
> Quite frankly, I'm stumped on "defining synchronous and asynchronous."

I'd start with the words themselves.  synchronous means "at the same time".
This refers to the timing of the function call and its effects.  In a
synchronous program, if I say "read()", then at that same time that "read()"
is called, the reading happens and the data is returned.  But, in an
*a*synchronous
("not at the same time") program, "read()" is called, but its effects happen
later.

This can obviously be fleshed out quite a bit, but I think that core concept
is what's important to communicate.

> Asynchronous had always made sense to me, coming from JavaScript, since
> it was "you click this button and something should happen!"  But that
> is a very different use-case of asynchronous programming than Twisted is.

Your experience with JavaScript — or at least, with GUI programming, since
JavaScript itself is terrible — might actually be a good way to explain the
problem here.  One example I like to use to explain why sometimes you just
can't block is this:

    button1 = Button()
    button2 = Button()
    # I need to wait for the user to click on this button
    button1.waitForClick()
    # okay now they've clicked it.
    message("Hooray you clicked button 1")
    button2.waitForClick()
    # oh dang, but what if they want to click button 2 first!?!

although you can probably devise a more lucid variant of that :).

One of these days I really want to write a combined Twisted / GTK tutorial
that shows how to ask questions in dialog boxes without blocking and
sub-main-loops and other nasty tricks that GTK programs often get up to in
order to have a question-and-answer UI.  Unfortunately, although these
examples do serve as easy-to-identify for learning Twisted programmers, it's
not always immediately clear how this corresponds to networking data, and
the extra complexity of GUI libraries makes it more difficult to run the
examples.

And Glyph raised some very salient concerns about what we were trying to
> teach people.  I just don't know what direction people are coming from.
>

I think the best assumption of background for such an introductory tutorial
is to assume that the user doesn't really understand what problem Deferreds
solve, and has thus never done any substantial work in an asynchronous
environment.  More experienced users will skim some parts, but that's fine:
more experienced users are easily able to figure out what Deferreds are even
with just the current documentation :).

We shouldn't treat this as a Python tutorial, but it should at least touch
briefly on callable objects and nested variables.

As such, the document now is targeted to "people who know the basics
> of asynchronous programming and grok callbacks", and I've incorporated
> Itamar's excellent suggesting of comparing explicit callback parameters
> and the Deferred object, which I hope dispells the notion of Deferred
> being magical fairly well (my assertion is Deferred is merely an
> abstraction over said callback parameters.)  I've also fully fleshed
> out the Deferreds reference; any omissions are my fault.
>

Again, I think this might be assuming a bit too much.  At the very least,
you should find a very, very good tutorial on callbacks and higher-order
functions in Python to point people to as a dependency, so that users who *
don't* have that experience can go read about it somewhere else.  (Actually,
every dependency of every document *really* ought to have hyperlinks to
other resources that teach that dependency, so that a user who doesn't know
python but needs to dive into a Twisted codebase will be put on their way
quickly.)

Even people who have some Python experience, but use callbacks rarely, will
often discover there are things they don't know when they start programming
with Twisted and nesting 5 or 6 callbacks in a function.  For example, many
people don't know all the fiddly rules of scope nesting.  Take a poll of
some potential targets for this intro documentation and ask them if they can
explain why this produces an error:

    def f(x=1):
        def t():
            if x > 3:
                x = 2
            else:
                return x
        return t

... but adding 'x=x' to the parameter list of 't' makes it work (although
not like they would expect if they manipulated 'x' in f).

The plan next is to discuss composing deferreds (which will also
> touch on when you should and how to create your own deferreds, as
> well as deferredlist) and the convenience primitives.
>

I think you need to start talking about creating your own Deferreds, at
least implicitly, very early on in the document.  For example, rather than
having "nonblocking_call" be a dummy function, have it maintain a list of
yet-to-complete calls, like this:

    pending = []
    def process(data):
        return "Processed: <" + data + ">"
    def nonblockingCall(data, whenSucceeded, whenFailed):
        pending.append((data, whenSucceeded, whenFailed))
    def completeOneCall(succeeded=True):
        data, whenSucceeded, whenFailed = pending.pop(0)
        if succeeded:
            whenSucceeded(process(data))
        else:
            whenFailed(RuntimeError("It failed, for some reason."))

then (A) you can demonstrate how the callbacks actually get called in a tiny
little system that the reader can play around with and get comfortable in
before understanding Deferred, and (B) you can illustrate the same example
again with some Deferred logic involved.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: </pipermail/twisted-python/attachments/20090806/a11f9edf/attachment.html>