Opened 2 years ago

Last modified 5 months ago

#5786 enhancement new

Add timeout implementation to Deferred, based on cancellation support

Reported by: itamar Owned by: itamar
Priority: normal Milestone:
Component: core Keywords:
Cc: Branch: branches/deferred-timeouts-5786-4
(diff, github, buildbot, log)
Author: itamarst, itamar, kaizhang Launchpad Bug:

Description (last modified by itamar)

If you can cancel a Deferred, you should be able to time it out as well.

Some non-obvious points:

  1. It'd be useful to have an ability to distinguish between different timeouts (and cancellation) via the exception in the Failure.
  2. You may want to add multiple timeouts, and have them affect different levels of the callback chain. Consider a high-level request than opens a connection, then sends out a HTTP request. The code that creates the network connection might want to add a 60 second timeout on the connection attempt. Code calling the combo high-level API might want a 10 minute timeout, which applies to combination of network connection and HTTP request.

Change History (34)

comment:1 in reply to: ↑ description Changed 2 years ago by glyph

Replying to itamar:

If you can cancel a Deferred, you should be able to time it out as well.

Some non-obvious points:

These are non-obvious :).

  1. It'd be useful to have a TimeoutError, separate from CancelledError.

Counterpoint: no, it wouldn't. What code would ever want to know this, and why? Cancellation can occur for any number of causes; timeouts are one of them, but there are plenty of others, and I think it would actually be bad to have code distinguish, for the most part, because you don't want code that works when a user explicitly cancels but fails when a timeout happens.

  1. You may want to add multiple timeouts, and have them affect different levels of the callback chain. Consider a high-level request than opens a connection, then sends out a HTTP request. The code that creates the network connection might want to add a 60 second timeout on the connection attempt. Code calling the combo high-level API might want a 10 minute timeout, which applies to combination of network connection and HTTP request.

We should probably have a discussion about how Tubes handle this case (the "progress" callback), since there is some stuff that you can do that's generic in support of this use-case, but I don't think it fits at this level.

Having a Deferred publish any API to set timeouts on events, e.g. other Deferreds and function calls, that aren't in your code and you can't observe, is probably bad. These would just be arguments to the functions that initiate certain activities, and those timeouts would be handled internally and would have specific error conditions associated with them. For example, you will also want a timeout on "time between bytes received on this connection" when a download is in progress, as distinct from "time to download the whole file" and "time to initially connect", but that's none of the Deferred's business (how would you even expose all of those distinct stages?).

comment:2 Changed 2 years ago by glyph

Basically I disagree with this feature and I would like to reject it and close the ticket as invalid. I was super happy when setTimeout went away, and I want to get defer out of twisted.internet. If we do need such a feature I really think it should be external to Deferred itself, but I would rather just have reactor.callLater(n, d.cancel) be the idiom here.

But without a lot more justification in terms of concrete use-cases, I would say we just shouldn't do this.

comment:3 Changed 2 years ago by itamar

Point 2 as a use case was based on actual code I've just written.

The implementation I will provide is indeed just a slight improvement over reactor.callLater(n, d.cancel) (which is insufficient only because it leaves dangling DelayedCalls). The minimal implementation you suggest, or the slightly improved one I intend to implement, both provide point 2 as a side effect of the implementation:

def getConnection(reactor, endpoint):
    d = endpoint.connect(Factory())
    addTimeout(reactor, d, 60) # 1 minute timeout limited to connection
    return d

def sendCommand(reactor, command):
    d = getConnection(reactor, makeEndpoint())
    d.addCallback(lambda conn: conn.request(command))
    addTimeout(reactor, d, 600) # 10 minute timeout on whole process
    return d

Even if your implementation was sufficient (and it almost is), it should be documented, and the nuances in point 2 explained.

comment:4 Changed 2 years ago by itamar

A correction: your naive implementation (in addition the resource leak) does *not* enable to functionality of multiple timeouts as in the example, because it doesn't cancel the first timeout when the Deferred fires.

So another reason to actually implement this and provide it in Twisted: the naive way is broken.

comment:5 follow-up: Changed 2 years ago by itamar

To expand further on the use case - sending a query involves:

  1. Getting a connection from a connection pool, or if none are available making a new one.
  2. Sending a query, returning response.

You really want to support different timeouts for those two, even though it's the same Deferred object. But because the Deferred is waiting for the 2nd action, which returned its own Deferred, if you don't cancel the timeout for the connection when it succeeds, it will cancel the query even though the query still has a whole bunch more time left.

comment:6 follow-up: Changed 2 years ago by itamar

  • Description modified (diff)

As for point 1, whose description I just updated:

Given multiple timeouts on a single Deferred, you want a way for a GUI, or a log message, to distinguish between them. Did the operation fail because the connection attempt timed out or because the command timed out? Now, you could do this yourself; every time you call addTimeout you also do:

addTimeout(reactor, d, 60)
def explainReason(reason):
    reason.trap(CancelledError)
    return Failure(MyTimeoutReason())
d.addErrback(explainReason)

But of course that will give the wrong result if someone *manually* cancels, and it's also needlessly repetitive boiler plate.

So point 1 is also necessary.

comment:7 in reply to: ↑ 5 ; follow-up: Changed 2 years ago by glyph

Replying to itamar:

To expand further on the use case - sending a query involves:

  1. Getting a connection from a connection pool, or if none are available making a new one.
  2. Sending a query, returning response.

You really want to support different timeouts for those two, even though it's the same Deferred object. But because the Deferred is waiting for the 2nd action, which returned its own Deferred, if you don't cancel the timeout for the connection when it succeeds, it will cancel the query even though the query still has a whole bunch more time left.

What API do you propose for adjusting these timeouts independently? Or even propagating their existence up through multiple layers of the Deferred stack? A Deferred can have any number of callbacks on it, and in some cases you don't even know in advance (think HTTP redirects).

comment:8 in reply to: ↑ 6 Changed 2 years ago by glyph

Replying to itamar:

As for point 1, whose description I just updated:

Given multiple timeouts on a single Deferred, you want a way for a GUI, or a log message, to distinguish between them.

OK, that might need to be a feature built in to deferred cancellation; I'd rather that it not be reflected by the type of the exception, but with an attribute of some kind. But, it's not deterministic. You cancel the Deferred from the consumer's perspective, and the originator gets to callback or errback it with whatever value or error it likes. You can't depend on the ability to catch different errors.

Did the operation fail because the connection attempt timed out or because the command timed out? Now, you could do this yourself; every time you call addTimeout you also do:

So wait - are you saying that the timeout will be creating a new Deferred for each timeout, and ... chaining the previous Deferred in? That's interesting. That could reliably give you different exceptions, by having a custom canceler. I hadn't thought of that.

But of course that will give the wrong result if someone *manually* cancels, and it's also needlessly repetitive boiler plate.

So point 1 is also necessary.

I still don't quite see how this follows, but maybe I'm starting to get your point.

comment:9 in reply to: ↑ 7 Changed 2 years ago by itamar

What API do you propose for adjusting these timeouts independently? Or even propagating their existence up through multiple layers of the Deferred stack? A Deferred can have any number of callbacks on it, and in some cases you don't even know in advance (think HTTP redirects).

Hmm. I hadn't thought much about adjustment; I guess my thought is that Deferred timeouts are for the absolute it-must-take-this-long-and-no-longer sort of timeout, where you set it and forget it. As opposed to "did I get any bytes in past 60 seconds" where you're constantly resetting, but that happens on the protocol level. In particular, because Deferreds are a "give me one result" API.

Does this make sense? Or can you think of a counter-example where you would want to adjust it? (Perhaps each layer adding a timeout should be able to adjust its timeout, though, even if there's no way to access other layers' timeouts).

Propagation happens automatically via the way cancellation knows how to pass cancellation to Deferreds returned by callbacks.

At this point I think I'll switch to coding and tests, so we have something more concrete to discuss.

comment:10 Changed 2 years ago by itamarst

  • Author set to itamarst
  • Branch set to branches/deferred-timeouts-5786

(In [34883]) Branching to 'deferred-timeouts-5786'

comment:11 Changed 2 years ago by itamarst

(In [34886]) Custom exceptions. Refs #5786.

comment:12 Changed 2 years ago by tom.prince

If adjusting the timeout is a usecase you want to support, it seems like the easiest way to handle that is to have setTimeout return an object that can adjust that timeout. Just returning the {{{IDelayedCall}} would perhaps be sufficient.

comment:13 Changed 2 years ago by itamar

  • Author changed from itamarst to itamar

The basic implementation is done (other than what Tom suggested), so I'd appreciate comments before moving on to documentation.

comment:14 Changed 2 years ago by tom.prince

  • Keywords review added

comment:15 Changed 2 years ago by tom.prince

  • Owner itamar deleted

comment:16 Changed 2 years ago by itamar

The implementation is completely done. I'd appreciate a review on the design before moving on to documentation.

comment:17 Changed 2 years ago by glyph

  • Owner set to glyph
  • Status changed from new to assigned

Reviewing.

comment:18 Changed 2 years ago by glyph

  • Keywords review removed
  • Owner changed from glyph to itamar
  • Status changed from assigned to new

Hey Itamar. Thanks for pursuing this.

Aesthetic judgments:

  1. The naming is just awful.
    1. The 'add' in the name addDeferredTimeout implies that this is something you might want to use multiple times on the same Deferred. But that would result in multiple cancellations. Which, of course, you might want in some very rarified circumestances, but generally that would not be the case. I suggest a verb-ier name, like timeOutDeferred.
    2. The callID variable implies that callLater returns some opaque identifier. It doesn't, and it hasn't fora really long time. The thing it does return, an IDelayedCall, would be better encapsulated by a name like, e.g. delayedCall.
    3. In naming the exception parameter, you should describe its role and not its type. What is this exception used for?
  2. The documentation is basically incoherent.
    1. The word 'timeout' is used 11 times in the documentation. First as a noun, in the name, where it's slightly ambiguous what it's describing. Then it's used as a noun describing something else - the amount of time before the timeout. Then, as the first word, it's used as a verb. Please clean this up to use "timeout" as little as possible and instead use active words that describe what's happening, like "canceled" or "called back" or "failed". I recommend the parameter be called something like "seconds" or "interval" or "delay".
    2. "Registering a custom canceller is recommended." Recommended by whom? For what purpose? In what way? (In other words: avoid the passive voice, please.) What would such a custom canceller do, especially with respect to this specific API which recommends it? Also: you don't register a custom canceller, you specify one. Registration implies a post-creation API that modifies some state. Deferreds must be created with their canceller. Furthermore, the user of the addDeferredTimeout API will, often as not, be consuming rather than originating a Deferred, and therefore won't be able to "register" a custom canceller. I don't think that this docstring should recommend anything specific about cancellers, custom or otherwise, but it should provide a reference to the cancellation documentation (although something as terse as @see: L{Deferred.cancel} would be sufficient.)
    3. I do not care about the API's inner thoughts and feelings about its role or its level of knowledge. I want to know what it does. The whole second paragraph that describes whether it is "aware" of certain callbacks or not is very confusing. Please rephrase it to describe its behavior with respect to those callbacks - will they be called?
    4. NO AMBIGUOUS ANTECEDENTS. "it will not be able to time it out". WHAT.
    5. Omit the needless word. "In particular" seems to convey no information. "typically the reactor... for testing" seems similarly useless. If this is important information, it should be on IReactorTime itself, not on some random API.
  3. The test docstrings use unclear language to describe what they're testing.
    1. test_noTimeoutIfCallback - says that it "will not cause a timeout". This is a negative assertion and therefore impossible to prove :). It should say something more specific about the state of the Deferred. Several other tests refer to 'causing timeouts' as well.
    2. test_timeout says something about the Deferred being "fired in time". In what time?
    3. I am a little confused by what it means to cancel the timeout "manually" as opposed to just preventing the Deferred from being cancelled.
  4. Attempting to throw an exception over the heads of a bunch of intervening callbacks, and hoping that it will reach the target point in the callback stack, strikes me as poor form. This should be introduced at some point in the call chain where you can make a definitive statement about all the enumerated failure modes. You need to be able to specify not only the exception that you want to raise, but also the exception(s) that you want to trap. But, ultimately, I don't think that this feature of cancellation is really helpful at all and I think it should be removed.
  5. A more useful feature, which I don't see any way to easily add to this externally, would be to force the timeout to happen immediately
  6. Failure.check is more efficient than Failure.trap and would likely read better in this code anyway.

Mandatory Stuff (violated policies, et. al.):

  1. The docstring is missing a couple of required elements.
    1. It's missing a @return and it doesn't return None. Documentation of the return type is always super important.
    2. You should also include separate @type annotations for each parameter (and an @rtype).
  2. This needs a place in the narrative documentation, as your pre-review comment suggests. A narrative explanation might help to better explain the relationship between the call chain and the timeout, which, except for the exception-type-translation stuff (which I don't think works entirely right and you don't really need anyway) is actually quite straightforward.
  3. The exception type translation suffers from a fatal flaw and can never really do what it's advertising 100% reliably. If you have a deferred A whose callback returns a deferred B, and B's custom canceller does a partial recovery and returns a result, but then the next callback on A raises CancelledError for some unrelated reason, it will appear as though the timeout exception was raised, when in fact, the timeout just made the operation complete faster and the cancellation was from somewhere else. But, again, I don't think this feature serves any real purpose: the real utility in something like this would be in conjunction with a custom canceller that would make the first exception raised by the Deferred being cancelled into a TimeoutError. More importantly though, nothing about the documentation or the tests explains why I would care about the distinction between these exceptions at the consumer's point in the call chain. This needs to be explained, and explained in fairly painstaking detail.

In short, this appears to be almost-adequately documented and adequately tested, and I can see no procedural reason not to merge it. However, I'm fairly unhappy with the design and the documentation seems as unhelpful as some of our oldest, crummiest documentation. I think this would be a lot better if it didn't attempt the exception-translation stuff and just did the one thing that users would generally forget, which is the reference-cleanup of the timeout callback itself when the Deferred fires; exception translation could be added as a separate feature later.

comment:19 Changed 2 years ago by itamar

Most of this is things I can just go and fix, but I don't want to punt on the custom exception use case without having a design. And it is a real use case: if my HTTP request timed out, "failed to look up host" is a very different thing than "response timed out", if only for debugging purposes (but possibly for business logic as well).

A reasonable design, which is implicit in your critique, is having the custom canceller be able to raise some exception other than CancelledError... but that is not currently part of the cancellation API. Does that seem like a reasonable feature to add? If so, it could then be used to implement custom exceptions for timeouts. Obviously these would be two additional, separate tickets.

comment:20 follow-up: Changed 2 years ago by itamar

I should note, by the way, that in my opinion there is no reason for a "passthrough" errback to trap a CancelledError under any circumstances - CancelledError should only be handled by the final errback. But perhaps there is a use case for that which escapes me.

comment:21 Changed 2 years ago by exarkun

And it is a real use case: if my HTTP request timed out, "failed to look up host" is a very different thing than "response timed out", if only for debugging purposes (but possibly for business logic as well).

Have you implemented this using this cancellation API? It'd be helpful to see the code.

comment:22 in reply to: ↑ 20 Changed 2 years ago by glyph

Replying to itamar:

I should note, by the way, that in my opinion there is no reason for a "passthrough" errback to trap a CancelledError under any circumstances - CancelledError should only be handled by the final errback. But perhaps there is a use case for that which escapes me.

There's no such thing as a "final" errback, as you well know. The use-case is that you're using Twisted's APIs, and they provide a certain working model (i.e. Deferred control flow) and you expect it to be honored consistently ;).

More seriously, cancellation would be intercepted for the same reason that any application would implement a SIGINT handler. There are some applications which you just want to do as much work as possible and when a user calls cancel() that's their way of saying "OK, I've waited long enough, just give me what you've got". Consider a 3D rendering program with a preview function, where the output of the Deferred is going to be an image either way, but may be lower resolution if you cancel it before it's all the way done.

comment:23 Changed 15 months ago by itamar

  • Owner changed from itamar to kaizhang

The work to be done here:

  1. Remove the exception argument to the new function. We can always add that functionality later.
  2. Then, address all review comments that are still applicable.

comment:24 Changed 15 months ago by kaizhang

  • Author changed from itamar to itamar, kaizhang
  • Branch changed from branches/deferred-timeouts-5786 to branches/deferred-timeouts-5786-2

(In [39637]) Branching to 'deferred-timeouts-5786-2'

comment:25 Changed 14 months ago by kaizhang

  • Keywords review added
  • Owner kaizhang deleted

comment:26 Changed 13 months ago by tom.prince

  • Keywords review removed
  • Owner set to kaizhang

Thanks for working on this.

  1. Most of the tests want to be changed to use successResultOf and/or failureResultOf.
  2. Consider using IDelayedCall.active instead of IReactorTime.getDelayedCalls to check if the timeout has been cancelled.
  3. In test_timeout, the call will trigger after advancing excatly 10.
  4. test_callbackStack should probably be split into (at least) two tests, one checking that the cancellation doesn't happen for the first timeout. And the scond, that the cancellation doest happen for the second timeout.

Please resubmit for review after addressing the above points.

  1. It is unclear to me why the final example in the narrative documentation is in the form of interaction with the python interpreter. Condsider changing it to match the other examples.
  2. The pargraph about waiting on a deferred in the documentation for timeOutDeferred is awkward, but I don't have any concrete suggetions for improving it. Particularly "The waited deferred" feels ungrammatical.
  3. Consider renaming the function timeoutDeferred. I'm inclined to treat "timeout" as a single compound word, rather than two words. Existing usage in twisted seems to agree: 56 files where it is treated as a single word compared to 18).

comment:27 follow-up: Changed 10 months ago by itamar

  • Owner changed from kaizhang to itamar

comment:28 in reply to: ↑ 27 Changed 10 months ago by kaizhang

Replying to itamar:

Hi itamar, are you going to take over and finish this ticket? I'm sorry that I'm busying with job finding and finishing my paper so I haven't done my tickets yet. It's great if you finish this ticket. Also my vacation will start at next week and last a whole month so I will be available then.

Again, sorry about the delayed tickets.

comment:29 Changed 10 months ago by itamar

I was hoping to finish it, but you can feel free to take it back over if you have the time. It'd be great to have you working on Twisted again if you do!

comment:30 Changed 9 months ago by itamarst

  • Author changed from itamar, kaizhang to itamarst, itamar, kaizhang
  • Branch changed from branches/deferred-timeouts-5786-2 to branches/deferred-timeouts-5786-3

(In [41623]) Branching to 'deferred-timeouts-5786-3'

comment:31 Changed 7 months ago by itamar

Still remaining is fixing up howto, and then it's ready for review again.

comment:32 Changed 7 months ago by itamarst

  • Branch changed from branches/deferred-timeouts-5786-3 to branches/deferred-timeouts-5786-4

(In [42312]) Branching to 'deferred-timeouts-5786-4'

comment:33 Changed 7 months ago by itamar

  • Keywords review added
  • Owner itamar deleted

OK, ready for review again.

comment:34 Changed 5 months ago by rwall

  • Keywords review removed
  • Owner set to itamar

Thanks Itamar, (and kaizhang)

Notes:

  • Builds pass (apart from some spurious errors)
  • The new documentation renders correctly in sphinx.
  • Tests pass locally
  • Merges cleanly

Points:

  1. source:branches/deferred-timeouts-5786-4/twisted/internet/defer.py
    1. "Don't use this. Use L{timeoutDefered}." -- Maybe add a link to a ticket about deprecating then removing this API.
  1. source:branches/deferred-timeouts-5786-4/twisted/test/test_defer.py
    1. "class TimeoutTests(unittest.TestCase):" -- Maybe inherit from SynchronousTestCase instead.
    2. test_noTimeoutIfCallback
      1. The first two assertions seem unnecessary
      2. And the important last assertion might be better tested by checking IDelayedCall.active on the return value of timeoutDeferred.
    3. test_noTimeoutIfErrback
      1. Same applies here.
    4. test_noTimeoutIfCancel
      1. Again I might be nicer to check whether the DelayedCall is still active.
    5. test_timeout
      1. Seems like it would be better to advance the reactor to exactly the timeout value rather than *slightly* before then after.
      2. And again assert that the DelayedCall is not active.
    6. test_callbackStack
      1. I found the docstring difficult to understand...or at least the docstring didn't seem to accurately describe the test implementation.
      2. And I'm not sure this test is necessary since it seems to be testing the standard cancellation behaviour rather than anything specific to timeoutDeferred.
    7. test_multipleTimeouts
      1. Again, isn't this just testing the standard cancellation behaviour of chained deferreds, or am I misunderstanding?
    8. test_cancelReturnedDelayedCall
      1. I'd quite like to see this near the top of the TestCase because it's such an important part of the new API and so that the test story flows neatly into subsequent tests that then make use of the returned DelayedCall.
  1. I don't see a response to glyph's 3D rendering usecase in #5786:comment:22. His comment seems to be more about the cancellation API in general, but perhaps his point here is that there are circumstances where you need to differentiate between cancellation due to a timeout and some other cancellation reason. How could a user do that if they used this API?
  1. Some of my comments match those made by tom in #5786:comment:26

As far as I'm concerned this can be merged after you answer or address the numbered points above. It's been through 3 rounds of review from glpyh, exarkun and tom.prince. But if you'd like another opinion then resubmit for another review.

Note: See TracTickets for help on using tickets.