Opened 7 years ago

Last modified 5 years ago

#6365 enhancement new

Add something like DeferredPooler (but simpler) to twisted.internet.defer

Reported by: Jean-Paul Calderone Owned by: lvh
Priority: normal Milestone:
Component: core Keywords:
Cc: Branch: branches/multideferred-6365
branch-diff, diff-cov, branch-cov, buildbot
Author: lvh

Description

Terry Jones shared the DeferredPooler class with the twisted-python mailing list some while back:

http://twistedmatrix.com/pipermail/twisted-python/2009-April/019522.html

This is a pretty useful pattern, we should add better library support for the idea to Twisted.

I propose only supporting zero-argument versions of this idea at first.

I also propose adding a non-decorator version (again, at least initially - and consider revisiting the decorator idea later). An API might be:

class DateClient(LineOnlyReceiver):
    def __init__(self):
        self.date = DeferredPooler()

    def lineReceived(self, line):
        if line.lower().startswith("date:"):
            self.date.result(line)

client = DateClient() # get it connected somehow too, obviously
d1 = client.date()
d2 = client.date()
d3 = client.date()

In this example, when self.date.result(line) runs, d1, d2, and d3 would get line as a result. These three Deferreds are entirely independent otherwise.

Specifics of the API could certainly be tweaked. I'm not sure I totally love the fact that this API puts logic into a __call__, but maybe it's fine.

Change History (17)

comment:1 Changed 6 years ago by lvh

Owner: set to lvh

comment:2 Changed 6 years ago by lvh

Author: lvh
Branch: branches/multideferred-6365

(In [40516]) Branching to 'multideferred-6365'

comment:3 Changed 6 years ago by lvh

Keywords: review added
Owner: lvh deleted

comment:4 Changed 6 years ago by lvh

Despite the commit name, I am the author of this. (Technically, I did apply a patch: I just got it from my local git repo, because I figured using combinator was still easier than fighting git.)

comment:5 Changed 6 years ago by lvh

As a review point for myself: this doesn't expose itself through all. I'll address this when it gets reviewed :)

comment:6 Changed 6 years ago by lvh

Oh, and the TestCase should probably be SynchronousTestCase.

comment:7 Changed 6 years ago by Jean-Paul Calderone

Keywords: review removed
Owner: set to lvh

Thanks for working on this.

This isn't quite what I had in mind but it looks like it may do the job. :)

A few comments:

  1. twisted/internet/defer.py
    1. We should avoid naming any new class to include the word Deferred. People will just be confused by this.
    2. MultiDeferred is missing some attribute docs in the class docstring.
    3. Another naming quibble - tee is unnecessarily jargon-y. In the API I implemented elsewhere, the class was named EventChannel and the method was named subscribe. I'm not suggesting these must be the names here but I do think the names should reinforce each other as they do in that example. "Channel" and "subscribe" are vocabulary from the same system metaphor.
    4. The firing logic is unnecessarily complicated. Deferred.callback(aFailure) is the same as Deferred.errback(aFailure).
    5. Some consideration for garbage collection is probably merited. There seems to be no reason this class needs to keep a reference to all of the Deferred instances it has handed out after it has fired them.
    6. Cancellation for these Deferreds is probably pretty easy. It probably doesn't offer much of an improvement over the default, either, though. It's worth considering implementing, at least. And maybe worth a quick comment about why it's not worth implementing if that's the conclusion.
    7. Update __all__
  2. twisted/test/test_defer.py
    1. Too many assertions in many of these test methods. Test separate cases in separate test methods, please.
    2. xrange? Pssh. :) range, please.
    3. Does any logic actually vary for the 10 Deferred case as compared to, say, the 2 Deferred case? I'm already suspicious after reading test_callback and its use of tee three times. 10 tee calls in test_synchronousCallbacks stretches credulity too far.
    4. assertIs is now preferred over assertIdentical, apparently.
  3. This deserves a feature news fragment.
  4. Some docs about this should be added to a howto.

Thanks again!

comment:8 Changed 6 years ago by Jean-Paul Calderone

#7378 was a duplicate of this.

comment:9 Changed 6 years ago by zooko

A similar utility that we've used for many years in Tahoe-LAFS is this:

https://github.com/zooko/pyutil/blob/29c1dc7b42f993bed1229cf63613bd076b726e62/pyutil/observer.py

And here are its tests:

https://github.com/zooko/pyutil/blob/c3a284a74faa00c2f6b366a60c847057404781da/pyutil/test/current/test_observer.py

That code is actually the separately-packaged version of the utility. The version of it that is actually bundled in Tahoe-LAFS is here: https://tahoe-lafs.org/trac/tahoe-lafs/browser/trunk/src/allmydata/util/observer.py?rev=88d7ec2d5451a00c78e256e732640e0ef3ca035f

comment:10 Changed 6 years ago by dawuud

commenting on your twisted/internet/defer.py

I noticed your firing loop differs in implementation from Brian Warner's :

yours:

for d in self._deferreds:

d.callback(result)

Brian Warner's uses an eventual-send operation that fires each deferred in a later reactor turn:

for w in self._watchers:

eventually(w.callback, result)

The implementation of eventually is in Foolscap: https://github.com/warner/foolscap/blob/master/foolscap/eventual.py#L6-L59

which uses a: reactor.callLater(0, ...)

So I guess my question to knowledgeable twisted developers here is... which is better? Isn't Brian's use of this eventually more "cooperative"? Isn't this a good thing?

comment:11 Changed 6 years ago by lvh

Slightly :) I think it may be *slightly* better at least... If anyone else +1s and there's not a good reason not to, I'm inclined to do that :)

comment:12 Changed 6 years ago by zooko

It is better if the observers get callbacked in a later turn, not in the current turn, since that can cause bugs if the programmer assumed the callback would happen in a later turn.

comment:13 Changed 6 years ago by Jean-Paul Calderone

Twisted doesn't have "turns". Inserting use of random callLater(0, ...) is generally not helpful. Consistently making all calls non-reentrant is a great thing but getting there from here requires a little more effort. Meanwhile, the extra reactor dependency makes unit tests more difficult to write.

comment:14 in reply to:  13 Changed 6 years ago by zooko

Replying to exarkun:

Twisted doesn't have "turns". Inserting use of random callLater(0, ...) is generally not helpful. Consistently making all calls non-reentrant is a great thing but getting there from here requires a little more effort.

Wait, you mean if I say reactor.callLater(0, frob) that frob might get executed before my call to reactor.callLater() returns?

comment:15 Changed 6 years ago by Jean-Paul Calderone

Sorry, no, that's not what I'm saying. callLater(0, frob) does not call frob. The last sentence you quoted refers to the fact that there are other sources of re-entrancy in Twisted (which is likely related to the fact that Twisted doesn't have turns, if it did then maybe we'd have less re-entrancy).

comment:16 Changed 6 years ago by zooko

Okay, then I'm confused about what you meant in comment:13, but I think a good way to make progress would be to start from the API doc, as follows. I propose that the doc string of the MultiDeferred.callback method should either say:

Callbacks the deferreds previously produced by this object. The callbacks will be executed synchronously, i.e. before this method returns.

Or:

Callbacks the deferreds previously produced by this object. The callbacks will be executed asynchronously, i.e. after this method returns.

What do you think?

comment:17 Changed 5 years ago by Jean-Paul Calderone

I don't like the verb Callbacks. ;)

More seriously, I think that the former:

Callbacks the deferreds previously produced by this object. The callbacks will be executed synchronously, i.e. before this method returns.

is probably the way to go. In the format of a trac comment, I find it difficult to completely explain this preference. A kind of summary of my motivation, though, is that the former (my preference) results in an API that is easier to use, more idiomatic, and potentially results in re-entrancy bugs whereas the latter (not my preference) results in an API that is harder to use, not idiomatic, and avoids one class of re-entrancy bugs.

The former has two nice things and one not nice thing. The latter has one nice thing and two not nice things. The ideal solution would have only nice things. Failing that, the option with two nice things seems preferable (counting points is likely a hopelessly naive approach; suggestions for an alternate, superior means of evaluating the alternatives welcome).

Okay, having written this much, I feel like I should ramble on a bit more. I do think that these re-entrancy concerns are based in a real problem. I think this can be a serious problem. It often breaks code and it frequently does so in a way that's both difficult to debug and very challenging to explain to folks who've not encountered this problem before. Twisted is rife with APIs that may lead to unexpected re-entrancy. If I were to start from scratch, I would certainly go out of my way to avoid such APIs. To give any readers who aren't familiar with this problem some idea of how pervasive this is, here are some examples of APIs in Twisted that can result in a re-entrant call: ITransport.write, IProtocol.dataReceived, Deferred.callback, Deferred.addCallback, and IReactorTCP.listenTCP. This list is by no means complete.

Given that, it seems like a losing battle (to me) to try to address the problem of re-entrancy in an ad hoc, piecemeal way - that is, to rely on a few clever people to jump in with a review comment on every ticket where this problem is going to be made worse. In practice, this means that 1 case in 100 might be avoided, producing an API without this problem, while 99 other cases will slip through. The result will be some extra work in that 1 case and, because of those 99 other cases and all of the existing APIs in Twisted, it will still be impossible to use Twisted for any non-trivial application without understanding re-entrancy and guarding against its consequences.

Some more systematic approach might solve the problem entirely, though probably only with the investment of significant effort over a long period of time. I think that would actually be a good idea but I don't have the resources to do anything more than suggest it.

I hope I've clarified my ideas on this topic somewhat. Sorry it took me 11 months and some prodding via other channels before I did so. I'll probably step out of this discussion at this point (unless there are more specific questions about what I meant by things I've written here) and let other folks decide what course of action makes sense for this issue.

Thanks to everyone else who's contributed here so far.

Note: See TracTickets for help on using tickets.