Opened 14 years ago

Last modified 7 years ago

#763 enhancement new

Reactor should cache loop start time for use instead of time.time()

Reported by: jknight Owned by:
Priority: normal Milestone:
Component: core Keywords:
Cc: Jean-Paul Calderone, itamarst, jknight, zooko, mrjbq7, therve Branch:


Change History (18)

comment:1 Changed 14 years ago by jknight

time.time() is slow, and in normal operation, the reactor loop start time will not be significantly 
different from the current time, so, code could use something like reactor.lastTime instead of 
time.time(). That value would be updated with the current time once per reactor iteration.

comment:2 Changed 13 years ago by Jean-Paul Calderone

How do you plan to deal with non-normal operation?

comment:3 Changed 13 years ago by jknight

By decreeing that it doesn't matter, because your timings will all be way off 
anyways if your loops last a long time.

comment:4 Changed 12 years ago by mrjbq7

This would be a very helpful feature.  I could use a cached time that is
available during the reactor loop execution.  For me, using time.time() is not
an option due to the overhead of multiple calls per reactor loop.

comment:5 Changed 12 years ago by zooko

I'd like to have this -- same reasons as per other comments.

comment:6 Changed 11 years ago by therve

Cc: therve added
Component: core

Adding a lastTime/currentTime attribute to the reactor seems to be useful. I don't know if using it LoopingCall is accurate, though. But that would help resolve things like what is used in twisted.web.http for logging (see #75).

comment:7 Changed 11 years ago by Jean-Paul Calderone

I'm fairly worried that this would cause surprising behavior without supplying a sufficient performance enhancement to justify it.

If we want to do this, we should get some concrete performance numbers to see just what we're buying.

comment:8 Changed 11 years ago by therve

What do you think about the currentTime attribute though ? That seems to be a useful addition to me.

comment:9 Changed 11 years ago by Jean-Paul Calderone

Isn't it redundant now that IReactorTime has a seconds method?

comment:10 Changed 11 years ago by therve

The seconds method is just a wrapper around time.time(), so it makes the call, there is no cache.

comment:11 Changed 11 years ago by zooko

Exarkun: how would this surprise programmers? I thought it was the opposite -- that this change would make the behavior of the reactor more predictable to programmers by solving issue #1396 (in a different way than the patch in that ticket does).

Based on other comments that you've made e.g. in ticket #1396, I think that you and I have observed different coding patterns with regard to callLater -- you appear to have rarely seen code that does callLater(N,...);callLater(N,...) and expects order to be preserved, but more often to have seen code that does callLater(N,...) and expects time.time() when the delayed call happens to be approximately N greater than time.time() when the callLater() was executed. My experience is the opposite -- the code that I have seen almost never relies on it being close to N (and it often isn't anywhere close to N), but I have several times seen errors in which code expected order to be preserved and failed badly when it wasn't.

comment:12 Changed 11 years ago by Jean-Paul Calderone

Regarding code which calls callLater repeatedly with the same delay and depends on order being preserved, while this is not an unreasonable expectation, and I don't object to Twisted offering this behavior, I am certain that it is symptomatic of a misunderstanding of how one should schedule code to run. Guaranteeing this behavior in Twisted will fix superficial bugs in code written this way, but not the underlying problem (which is why I am not eager to implement the changes myself). The underlying problem in every case I have seen is that while it may be correct for the first callable in such a sequence to be triggered by a time event, all the subsequent calls should be triggered by some other event source; specifically, they should generally be triggered by an event fired by the first (either directly or indirectly).

Regarding code which expects time.time() to increase by approximately N when examined in a call delayed with callLater(N, ...), I think this is likewise superficially broken (since Twisted uses the system clock, and the system clock may change in essentially arbitrary ways) but does not have the same underlying problem. For circumstances where LoopingCall is inappropriate (a variable interval, for example), the only problem I can think of with wanting to schedule calls using this pattern is that uncooperative application code or a busy reactor may cause a call to execute at a somewhat delayed time from the actual scheduled time. In this case, it may sometimes be wrong to assume the above, but in others it is perfectly reasonable. If the reactor can guarantee a stable time source (eg, not time.time, vis #1396).

Given that, the surprise for developers comes in the form of a violation of their basic assumptions about the environment. This is most significant in "unusual" circumstances. For a "normal" Twisted program, I expect there will be little or no noticeable difference in behavior. However, there are plenty of cases which won't be encountered every day in which it is useful to be able to reason about the behavior of the system, and for this, a simpler system is vastly preferable to a system with numerous special cases or other deviations from the norm. Ask a programmer when time advances: they'll say "all the time," not "when the last event handler in a pseudo-arbitrarily constructed collection cooperatively yields control of execution." Perhaps Twisted programmers are all much more sophisticated than the average programmer, but I expect that keeping things simple as a default policy still makes sense. Nevertheless, if a speedup can be demonstrated, with proper documentation this still may be worth pursuing. I'm simply asking for that demonstration.

comment:13 Changed 11 years ago by mrjbq7

Many components in the application use current time for other reasons than scheduling behavior.

An example to think about is a network application that receives events from places on the network. Each of these events should be timestamped with the current time. It is not important to have extremely high precision (e.g. no need for cpu clock or microseconds, nanoseconds, etc.). You might be receiving 1000+ messages per second or more.

The relative overhead of time.time() and the subsequent get_time_of_day() call can add un-necessary overhead to the application.

A patch (in my mind) could look as simple as:

from twisted.internet import reactor
doIteration = reactor.doIteration
def _doIteration(*args, **kwargs): = time()
    doIteration(*args, **kwargs)
reactor.doIteration = _doIteration

This would allow access to the cached time. If a single reactor loop takes longer than the clock resolution then (as was pointed out) it would be out-of-date, but this is exactly the feature request desired. And in most cases where this level of performance is desired the reactor loop will not often take more than 1 millisecond.

comment:14 Changed 11 years ago by Jean-Paul Calderone

Thanks for making this use-case explicit. Doing this many calls to time.time() at the application-level is clearly something to be avoided. Does the reactor need to help here, though? The application can still do one call to cache the value by itself.

comment:15 Changed 11 years ago by therve

There are already at least an example in twisted.web.http that would make use of this functionnality (for log timestamp).

comment:16 Changed 11 years ago by Jean-Paul Calderone

But it already does caching. Reimplementing it to use some reactor functionality would certainly be a change. Would it be an improvement?

comment:17 Changed 11 years ago by Glyph

Reimplementing time caching as reactor functionality would make it possible for multiple cooperating systems which require frequent access to an approximation of the current time, but also run frequently and wish to avoid the overhead of a system call to retrieve it, to call it even less. There are at least a few systems in Twisted which make use of the time and could be optimimzed not to use it. HTTP logging and regular logging are among them. It would be a further optimization to applications wishing to avoid making more system calls, because there would be one cache for the entire process rather than one per system interested in optimization.

However, although we currently know that a constantly-advancing time works for most systems, but we don't know whether a sometimes-equal time will. Each system will have to be approached with caution, and this will increase the maintenance burden somewhat, since cached times have an impact that can be difficult to express with tests. As exarkun says, someone needs to demonstrate whether this optimization would actually be worthwhile; having profiled and straced twisted applications I know that gettimeofday definitely takes up an appreciable amount of time, but I have no idea what the impact of any particular call or set of calls is relative to the performance of a whole application, or whether it is even possible to avoid enough calls to make this worthwhile.

If this is implemented, I have a few suggestions as to how it might be done. A more accurate time is pretty much always desirable except for the expense of calculating it, so rather than caching per "reactor tick" (a meaningless unit, and one I'd like to get rid of) we should cache once per call to seconds(). Calls to cachedSeconds() could always return the last value returned by seconds() and a second method's docstring would give us a convenient place to document the purpose of the new behavior. I don't like the idea of calling the attribute "now" or some other euphemism for "the current time", because the use-case here is quite explicit: this is to be used when you know that you can avoid recomputing the current time because it "probably hasn't been too long" since the last time it was retrieved. It shouldn't be the suggested idiom for retrieving the current time - if you need this optimization, you know you need it.

comment:18 Changed 7 years ago by <automation>

Owner: jknight deleted
Note: See TracTickets for help on using tickets.