Ticket #5786 enhancement new

Opened 10 months ago

Last modified 9 months ago

Add timeout implementation to Deferred, based on cancellation support

Reported by: itamar Owned by: itamar
Priority: normal Milestone:
Component: core Keywords:
Cc: Branch: branches/deferred-timeouts-5786
Author: itamar Launchpad Bug:

Description (last modified by itamar) (diff)

If you can cancel a Deferred, you should be able to time it out as well.

Some non-obvious points:

  1. It'd be useful to have an ability to distinguish between different timeouts (and cancellation) via the exception in the Failure.
  2. You may want to add multiple timeouts, and have them affect different levels of the callback chain. Consider a high-level request than opens a connection, then sends out a HTTP request. The code that creates the network connection might want to add a 60 second timeout on the connection attempt. Code calling the combo high-level API might want a 10 minute timeout, which applies to combination of network connection and HTTP request.

Change History

1

in reply to: ↑ description   Changed 10 months ago by glyph

Replying to itamar:

If you can cancel a Deferred, you should be able to time it out as well. Some non-obvious points:

These are non-obvious :).

1. It'd be useful to have a TimeoutError, separate from CancelledError.

Counterpoint: no, it wouldn't. What code would ever want to know this, and why? Cancellation can occur for any number of causes; timeouts are one of them, but there are plenty of others, and I think it would actually be bad to have code distinguish, for the most part, because you don't want code that works when a user explicitly cancels but fails when a timeout happens.

2. You may want to add multiple timeouts, and have them affect different levels of the callback chain. Consider a high-level request than opens a connection, then sends out a HTTP request. The code that creates the network connection might want to add a 60 second timeout on the connection attempt. Code calling the combo high-level API might want a 10 minute timeout, which applies to combination of network connection and HTTP request.

We should probably have a discussion about how Tubes handle this case (the "progress" callback), since there is some stuff that you can do that's generic in support of this use-case, but I don't think it fits at this level.

Having a Deferred publish any API to set timeouts on events, e.g. other Deferreds and function calls, that aren't in your code and you can't observe, is probably bad. These would just be arguments to the functions that initiate certain activities, and those timeouts would be handled internally and would have specific error conditions associated with them. For example, you will also want a timeout on "time between bytes received on this connection" when a download is in progress, as distinct from "time to download the whole file" and "time to initially connect", but that's none of the Deferred's business (how would you even expose all of those distinct stages?).

2

  Changed 10 months ago by glyph

Basically I disagree with this feature and I would like to reject it and close the ticket as invalid. I was super happy when setTimeout went away, and I want to get defer out of twisted.internet. If we do need such a feature I really think it should be external to Deferred itself, but I would rather just have reactor.callLater(n, d.cancel) be the idiom here.

But without a lot more justification in terms of concrete use-cases, I would say we just shouldn't do this.

3

  Changed 10 months ago by itamar

Point 2 as a use case was based on actual code I've just written.

The implementation I will provide is indeed just a slight improvement over reactor.callLater(n, d.cancel) (which is insufficient only because it leaves dangling DelayedCalls). The minimal implementation you suggest, or the slightly improved one I intend to implement, both provide point 2 as a side effect of the implementation:

def getConnection(reactor, endpoint):
    d = endpoint.connect(Factory())
    addTimeout(reactor, d, 60) # 1 minute timeout limited to connection
    return d

def sendCommand(reactor, command):
    d = getConnection(reactor, makeEndpoint())
    d.addCallback(lambda conn: conn.request(command))
    addTimeout(reactor, d, 600) # 10 minute timeout on whole process
    return d

Even if your implementation was sufficient (and it almost is), it should be documented, and the nuances in point 2 explained.

4

  Changed 10 months ago by itamar

A correction: your naive implementation (in addition the resource leak) does *not* enable to functionality of multiple timeouts as in the example, because it doesn't cancel the first timeout when the Deferred fires.

So another reason to actually implement this and provide it in Twisted: the naive way is broken.

5

follow-up: ↓ 7   Changed 10 months ago by itamar

To expand further on the use case - sending a query involves:

1. Getting a connection from a connection pool, or if none are available making a new one. 2. Sending a query, returning response.

You really want to support different timeouts for those two, even though it's the same Deferred object. But because the Deferred is waiting for the 2nd action, which returned its own Deferred, if you don't cancel the timeout for the connection when it succeeds, it will cancel the query even though the query still has a whole bunch more time left.

6

follow-up: ↓ 8   Changed 10 months ago by itamar

  • description modified (diff)

As for point 1, whose description I just updated:

Given multiple timeouts on a single Deferred, you want a way for a GUI, or a log message, to distinguish between them. Did the operation fail because the connection attempt timed out or because the command timed out? Now, you could do this yourself; every time you call addTimeout you also do:

addTimeout(reactor, d, 60)
def explainReason(reason):
    reason.trap(CancelledError)
    return Failure(MyTimeoutReason())
d.addErrback(explainReason)

But of course that will give the wrong result if someone *manually* cancels, and it's also needlessly repetitive boiler plate.

So point 1 is also necessary.

7

in reply to: ↑ 5 ; follow-up: ↓ 9   Changed 10 months ago by glyph

Replying to itamar:

To expand further on the use case - sending a query involves: 1. Getting a connection from a connection pool, or if none are available making a new one. 2. Sending a query, returning response. You really want to support different timeouts for those two, even though it's the same Deferred object. But because the Deferred is waiting for the 2nd action, which returned its own Deferred, if you don't cancel the timeout for the connection when it succeeds, it will cancel the query even though the query still has a whole bunch more time left.

What API do you propose for adjusting these timeouts independently? Or even propagating their existence up through multiple layers of the Deferred stack? A Deferred can have any number of callbacks on it, and in some cases you don't even know in advance (think HTTP redirects).

8

in reply to: ↑ 6   Changed 10 months ago by glyph

Replying to itamar:

As for point 1, whose description I just updated: Given multiple timeouts on a single Deferred, you want a way for a GUI, or a log message, to distinguish between them.

OK, that might need to be a feature built in to deferred cancellation; I'd rather that it not be reflected by the type of the exception, but with an attribute of some kind. But, it's not deterministic. You cancel the Deferred from the consumer's perspective, and the originator gets to callback or errback it with whatever value or error it likes. You can't depend on the ability to catch different errors.

Did the operation fail because the connection attempt timed out or because the command timed out? Now, you could do this yourself; every time you call addTimeout you also do:

So wait - are you saying that the timeout will be creating a new Deferred for each timeout, and ... chaining the previous Deferred in? That's interesting. That could reliably give you different exceptions, by having a custom canceler. I hadn't thought of that.

But of course that will give the wrong result if someone *manually* cancels, and it's also needlessly repetitive boiler plate. So point 1 is also necessary.

I still don't quite see how this follows, but maybe I'm starting to get your point.

9

in reply to: ↑ 7   Changed 10 months ago by itamar

What API do you propose for adjusting these timeouts independently? Or even propagating their existence up through multiple layers of the Deferred stack? A Deferred can have any number of callbacks on it, and in some cases you don't even know in advance (think HTTP redirects).

Hmm. I hadn't thought much about adjustment; I guess my thought is that Deferred timeouts are for the absolute it-must-take-this-long-and-no-longer sort of timeout, where you set it and forget it. As opposed to "did I get any bytes in past 60 seconds" where you're constantly resetting, but that happens on the protocol level. In particular, because Deferreds are a "give me one result" API.

Does this make sense? Or can you think of a counter-example where you would want to adjust it? (Perhaps each layer adding a timeout should be able to adjust its timeout, though, even if there's no way to access other layers' timeouts).

Propagation happens automatically via the way cancellation knows how to pass cancellation to Deferreds returned by callbacks.

At this point I think I'll switch to coding and tests, so we have something more concrete to discuss.

10

  Changed 10 months ago by itamarst

  • branch set to branches/deferred-timeouts-5786
  • branch_author set to itamarst

(In [34883]) Branching to 'deferred-timeouts-5786'

11

  Changed 10 months ago by itamarst

(In [34886]) Custom exceptions. Refs #5786.

12

  Changed 10 months ago by tom.prince

If adjusting the timeout is a usecase you want to support, it seems like the easiest way to handle that is to have setTimeout return an object that can adjust that timeout. Just returning the {{{IDelayedCall}} would perhaps be sufficient.

13

  Changed 10 months ago by itamar

  • branch_author changed from itamarst to itamar

The basic implementation is done (other than what Tom suggested), so I'd appreciate comments before moving on to documentation.

14

  Changed 10 months ago by tom.prince

  • keywords review added

15

  Changed 10 months ago by tom.prince

  • owner itamar deleted

16

  Changed 9 months ago by itamar

The implementation is completely done. I'd appreciate a review on the design before moving on to documentation.

17

  Changed 9 months ago by glyph

  • status changed from new to assigned
  • owner set to glyph

Reviewing.

18

  Changed 9 months ago by glyph

  • owner changed from glyph to itamar
  • keywords review removed
  • status changed from assigned to new

Hey Itamar. Thanks for pursuing this.

Aesthetic judgments:

  1. The naming is just awful.
    1. The 'add' in the name addDeferredTimeout implies that this is something you might want to use multiple times on the same Deferred. But that would result in multiple cancellations. Which, of course, you might want in some very rarified circumestances, but generally that would not be the case. I suggest a verb-ier name, like timeOutDeferred.
    2. The callID variable implies that callLater returns some opaque identifier. It doesn't, and it hasn't fora really long time. The thing it does return, an IDelayedCall, would be better encapsulated by a name like, e.g. delayedCall.
    3. In naming the exception parameter, you should describe its role and not its type. What is this exception used for?
  2. The documentation is basically incoherent.
    1. The word 'timeout' is used 11 times in the documentation. First as a noun, in the name, where it's slightly ambiguous what it's describing. Then it's used as a noun describing something else - the amount of time before the timeout. Then, as the first word, it's used as a verb. Please clean this up to use "timeout" as little as possible and instead use active words that describe what's happening, like "canceled" or "called back" or "failed". I recommend the parameter be called something like "seconds" or "interval" or "delay".
    2. "Registering a custom canceller is recommended." Recommended by whom? For what purpose? In what way? (In other words: avoid the passive voice, please.) What would such a custom canceller do, especially with respect to this specific API which recommends it? Also: you don't register a custom canceller, you specify one. Registration implies a post-creation API that modifies some state. Deferreds must be created with their canceller. Furthermore, the user of the addDeferredTimeout API will, often as not, be consuming rather than originating a Deferred, and therefore won't be able to "register" a custom canceller. I don't think that this docstring should recommend anything specific about cancellers, custom or otherwise, but it should provide a reference to the cancellation documentation (although something as terse as @see: L{Deferred.cancel} would be sufficient.)
    3. I do not care about the API's inner thoughts and feelings about its role or its level of knowledge. I want to know what it does. The whole second paragraph that describes whether it is "aware" of certain callbacks or not is very confusing. Please rephrase it to describe its behavior with respect to those callbacks - will they be called?
    4. NO AMBIGUOUS ANTECEDENTS. "it will not be able to time it out". WHAT.
    5. Omit the needless word. "In particular" seems to convey no information. "typically the reactor... for testing" seems similarly useless. If this is important information, it should be on IReactorTime itself, not on some random API.
  3. The test docstrings use unclear language to describe what they're testing.
    1. test_noTimeoutIfCallback - says that it "will not cause a timeout". This is a negative assertion and therefore impossible to prove :). It should say something more specific about the state of the Deferred. Several other tests refer to 'causing timeouts' as well.
    2. test_timeout says something about the Deferred being "fired in time". In what time?
    3. I am a little confused by what it means to cancel the timeout "manually" as opposed to just preventing the Deferred from being cancelled.
  4. Attempting to throw an exception over the heads of a bunch of intervening callbacks, and hoping that it will reach the target point in the callback stack, strikes me as poor form. This should be introduced at some point in the call chain where you can make a definitive statement about all the enumerated failure modes. You need to be able to specify not only the exception that you want to raise, but also the exception(s) that you want to trap. But, ultimately, I don't think that this feature of cancellation is really helpful at all and I think it should be removed.
  5. A more useful feature, which I don't see any way to easily add to this externally, would be to force the timeout to happen immediately
  6. Failure.check is more efficient than Failure.trap and would likely read better in this code anyway.

Mandatory Stuff (violated policies, et. al.):

  1. The docstring is missing a couple of required elements.
    1. It's missing a @return and it doesn't return None. Documentation of the return type is always super important.
    2. You should also include separate @type annotations for each parameter (and an @rtype).
  2. This needs a place in the narrative documentation, as your pre-review comment suggests. A narrative explanation might help to better explain the relationship between the call chain and the timeout, which, except for the exception-type-translation stuff (which I don't think works entirely right and you don't really need anyway) is actually quite straightforward.
  3. The exception type translation suffers from a fatal flaw and can never really do what it's advertising 100% reliably. If you have a deferred A whose callback returns a deferred B, and B's custom canceller does a partial recovery and returns a result, but then the next callback on A raises CancelledError for some unrelated reason, it will appear as though the timeout exception was raised, when in fact, the timeout just made the operation complete faster and the cancellation was from somewhere else. But, again, I don't think this feature serves any real purpose: the real utility in something like this would be in conjunction with a custom canceller that would make the first exception raised by the Deferred being cancelled into a TimeoutError. More importantly though, nothing about the documentation or the tests explains why I would care about the distinction between these exceptions at the consumer's point in the call chain. This needs to be explained, and explained in fairly painstaking detail.

In short, this appears to be almost-adequately documented and adequately tested, and I can see no procedural reason not to merge it. However, I'm fairly unhappy with the design and the documentation seems as unhelpful as some of our oldest, crummiest documentation. I think this would be a lot better if it didn't attempt the exception-translation stuff and just did the one thing that users would generally forget, which is the reference-cleanup of the timeout callback itself when the Deferred fires; exception translation could be added as a separate feature later.

19

  Changed 9 months ago by itamar

Most of this is things I can just go and fix, but I don't want to punt on the custom exception use case without having a design. And it is a real use case: if my HTTP request timed out, "failed to look up host" is a very different thing than "response timed out", if only for debugging purposes (but possibly for business logic as well).

A reasonable design, which is implicit in your critique, is having the custom canceller be able to raise some exception other than CancelledError... but that is not currently part of the cancellation API. Does that seem like a reasonable feature to add? If so, it could then be used to implement custom exceptions for timeouts. Obviously these would be two additional, separate tickets.

20

follow-up: ↓ 22   Changed 9 months ago by itamar

I should note, by the way, that in my opinion there is no reason for a "passthrough" errback to trap a CancelledError under any circumstances - CancelledError should only be handled by the final errback. But perhaps there is a use case for that which escapes me.

21

  Changed 9 months ago by exarkun

And it is a real use case: if my HTTP request timed out, "failed to look up host" is a very different thing than "response timed out", if only for debugging purposes (but possibly for business logic as well).

Have you implemented this using this cancellation API? It'd be helpful to see the code.

22

in reply to: ↑ 20   Changed 9 months ago by glyph

Replying to itamar:

I should note, by the way, that in my opinion there is no reason for a "passthrough" errback to trap a CancelledError under any circumstances - CancelledError should only be handled by the final errback. But perhaps there is a use case for that which escapes me.

There's no such thing as a "final" errback, as you well know. The use-case is that you're using Twisted's APIs, and they provide a certain working model (i.e. Deferred control flow) and you expect it to be honored consistently ;).

More seriously, cancellation would be intercepted for the same reason that any application would implement a SIGINT handler. There are some applications which you just want to do as much work as possible and when a user calls cancel() that's their way of saying "OK, I've waited long enough, just give me what you've got". Consider a 3D rendering program with a preview function, where the output of the Deferred is going to be an image either way, but may be lower resolution if you cancel it before it's all the way done.

Note: See TracTickets for help on using tickets.