[Twisted-Python] doPoll timeout problems

Tue Oct 16 23:59:07 MDT 2018

> On Oct 16, 2018, at 7:28 PM, Wim Lewis <wiml at hhhh.org> wrote:
> 
> Scott, Barry wrote:
>> We are seeing a problem with poll being called very often and think that the problem is in the timeout calculation.
> 
> I think you're right and that changing that line to use ceil() is a good fix. It does mean that some timeouts might happen up to a millisecond later than they would otherwise, but I think that's better than having the reactor spin-wait to consume time.
> 
> This probably only happens with the poll() reactor, because all the other APIs seem to have higher resolution timeouts --- select, epoll, kqueue, ppoll all have microsecond or even nanosecond timeout parameters.
> 
> If there is code that actually needs sub-millisecond timeout resolution it might be possible for Twisted to implement that using setitimer() or similar, but that seems like a lot of work to support an exotic use case that could be handled more efficiently by switching to epoll etc.
> 
> I checked the history of that line of code and it dates back all the way to the first poll()-based event loop implementation in 2001. Perhaps computers were slow enough back then that a call to poll() would take most of a millisecond :) but in any case the rounding-down behavior doesn't seem to have
> been explicitly chosen.
> 
>> These _updateLogDateTime call seems to be a lot of complexity for no benefit. After all time.time() is very fast on Linux, why cache the log time strings?
> 
> Perhaps it is not the time() call but the cost of converting a time value to a string that is being avoided here? Sometimes the calendar/date calculations are expensive.

I believe it dates back to a time when syscalls were surprisingly expensive, and gettimeofday() or equivalent was still a syscall.  I remember looking at profiles Back In The Day where getting the current time dominated. It certainly predates the zero-overhead VDSO method available today.

> It seems to me that HTTPFactory could be implemented more efficiently by only caching the string on demand, and then setting a timer to clear the cache (reset _logDateTime to None) at the next 1-second mark. On a heavily-loaded server, that would have the same properties as the current code; but on a lightly-loaded server it would avoid running that callback unneccessarily. And overall I don't think it's any more complicated than the current implementation.

The current algorithm is really just silly.  In the absence of a compelling benchmark where it can be shown to hurt performance (we should check with the ones on speed.twistedmatrix.com), I think we could forego the caching entirely, and just do the simple thing where the string is computed as needed.

-glyph