[Twisted-Python] doPoll timeout problems

Wed Oct 17 08:56:46 MDT 2018

On Wednesday, 17 October 2018 06:59:07 BST Glyph wrote:
> > On Oct 16, 2018, at 7:28 PM, Wim Lewis <wiml at hhhh.org> wrote:
> > 
> > Scott, Barry wrote:
> >> We are seeing a problem with poll being called very often and think that
> >> the problem is in the timeout calculation.> 
> > I think you're right and that changing that line to use ceil() is a good
> > fix. It does mean that some timeouts might happen up to a millisecond
> > later than they would otherwise, but I think that's better than having
> > the reactor spin-wait to consume time.

Do you want a pull request for this fix or will you make this small change?

> > 
> > This probably only happens with the poll() reactor, because all the other
> > APIs seem to have higher resolution timeouts --- select, epoll, kqueue,
> > ppoll all have microsecond or even nanosecond timeout parameters.
> > 
> > If there is code that actually needs sub-millisecond timeout resolution it
> > might be possible for Twisted to implement that using setitimer() or
> > similar, but that seems like a lot of work to support an exotic use case
> > that could be handled more efficiently by switching to epoll etc.
> > 
> > I checked the history of that line of code and it dates back all the way
> > to the first poll()-based event loop implementation in 2001. Perhaps
> > computers were slow enough back then that a call to poll() would take
> > most of a millisecond :) but in any case the rounding-down behavior
> > doesn't seem to have been explicitly chosen.
> > 
> >> These _updateLogDateTime call seems to be a lot of complexity for no
> >> benefit. After all time.time() is very fast on Linux, why cache the log
> >> time strings?> 
> > Perhaps it is not the time() call but the cost of converting a time value
> > to a string that is being avoided here? Sometimes the calendar/date
> > calculations are expensive.
> I believe it dates back to a time when syscalls were surprisingly expensive,
> and gettimeofday() or equivalent was still a syscall.  I remember looking
> at profiles Back In The Day where getting the current time dominated. It
> certainly predates the zero-overhead VDSO method available today.
> > It seems to me that HTTPFactory could be implemented more efficiently by
> > only caching the string on demand, and then setting a timer to clear the
> > cache (reset _logDateTime to None) at the next 1-second mark. On a
> > heavily-loaded server, that would have the same properties as the current
> > code; but on a lightly-loaded server it would avoid running that callback
> > unneccessarily. And overall I don't think it's any more complicated than
> > the current implementation.
> The current algorithm is really just silly.  In the absence of a compelling
> benchmark where it can be shown to hurt performance (we should check with
> the ones on speed.twistedmatrix.com), I think we could forego the caching
> entirely, and just do the simple thing where the string is computed as
> needed.

I'll find time to put up a pull request to simplify and allow for on/off 
config of the logging for you to consider.

Barry

> 
> -glyph
> _______________________________________________
> Twisted-Python mailing list
> Twisted-Python at twistedmatrix.com
> https://twistedmatrix.com/cgi-bin/mailman/listinfo/twisted-python