[Twisted-Python] Some things I've learned: safer callbacks, better t.p.context

Wed Oct 19 12:55:24 MDT 2016

> On Oct 18, 2016, at 7:09 PM, Kevin Conway <kevinjacobconway at gmail.com> wrote:
> 
> > making such aggressive use of private APIs that it could win a contest about how to ensure that you break on every new release of Twisted :)
> 
> We're very aware of that! It's one of the reasons we have the test matrix set up to run multiple versions of Python and Twisted. I have not started on 16.X compatibility yet.
> 
>  >  I imagine that it's a non-trivial impact to performance, so it would be worthwhile to track that.
> 
> We put this this through some extensive benchmarks and testing to measure the performance impact. For example, the details are logged in a commit message but, we initially implemented the @inlineCallbacks extension as a coroutine wrapper. However, we found that the way t.p.Failure tries to serialize itself, and its local+global scopes, to a dictionary caused enormous memory and CPU consumption when triggered because of the added objects in those spaces. The negative impact grew exponentially with levels of nested coroutines. Very bad day.

What are you referring to as a "coroutine" here?  A generator?  And exponential growth, you say?  That sounds very surprising.

> Once we pivoted to a small fork of @inlineCallbacks, we measured the overall performance hit to be negligible in our services. I'll dig around to see if I can find where we documented the actual numbers we saw. At a macro level, our service wide stats showed no meaningful growth of runtime or memory consumption.

Does this mean you only get context tracking against inlineCallbacks, and not other usages of Deferred?

> > digression on "Don't Switch In The Core"
> 
> I was surprised at how much switching this context implementation was when we put it in the lower level read/write callbacks. Each of our services process a large amount of continually streaming data and our profiles show, IIRC, that one of the top 5 consumers of CPU time was calling the read/write callbacks. When we added this to those paths it increased overall CPU usage by double digit percentage points. If this feature were available as an opt-in reactor extension then providers could capacity plan around the performance hit. We found it more valuable to move the switching closer to application protocol code where switches happen less frequently.

Maybe "switching" is more expensive than I realized.  Where is this implemented?

>   > I also take it from the performance notes that you're not using PyPy?
> 
> We're still on cPython. PyPy is something we've talked about before but haven't invested much time into yet. I don't know to what extent PyPy might change the performance characteristics of the project.

As I always tell people - if you care about performance, PyPy should be step zero.  Optimizing for CPython looks like adding weird implementation-specific hacks that might start working or be backwards in the next version; optimizing for PyPy means making the code simpler and more readable so the JIT can figure out what to do ;).  So the pressure that optimizing for PyPy exerts on your code is generally a lot healthier.

-glyph
-------------- next part --------------
An HTML attachment was scrubbed...
URL: </pipermail/twisted-python/attachments/20161019/7c0eb2cc/attachment-0002.html>