Opened 9 years ago

Last modified 9 years ago

#6278 enhancement new

The current implementation of inlineCallbacks has suboptimal performance on PyPy

Reported by: Alex Gaynor Owned by:
Priority: normal Milestone:
Component: core Keywords: performance
Cc: Alex Gaynor Branch:
Author:

Description

This is because it calls sys.exc_info() whenever defer.returnValue is called in order to verify that it really was the @inlineCallbacks function which called returnValue.

One proposed solution is to make it do this check only when in deferred debug mode, however Glyph notes that debugging this is sufficiently awful that it should really never be possible for this to occur.

Change History (13)

comment:1 Changed 9 years ago by Itamar Turner-Trauring

Another possible solution - custom exception per inlineCallbacks-function, rather than having a global returnValue:

@inlineCallbacks
def foo(x):
    result = yield doSomething(x)
    foo.returnValue(result)

comment:2 in reply to:  1 Changed 9 years ago by Glyph

Replying to itamar:

Another possible solution - custom exception per inlineCallbacks-function, rather than having a global returnValue:

Pro: relatively simple change; less stuff to import.

Con: have to change every single piece of inlineCallbacks using code ever in a not-quite-backwards-compatible way to get the performance benefit.

Another option is to have @inlineCallbacks rewrite the decorated function somehow so that references to returnValue will resolve to a returnValue specific to that invocation.

comment:3 Changed 9 years ago by Glyph

Ooh, and I just thought of another down-side of the simpler solution. It re-introduces a new way for the bug to occur.

The way this bug generally hit me in the past was when attempting to refactor code; copy/pasting some inlineCallbacks code into a non-inlineCallbacks function. I'd cut some code that called returnValue, paste it into another function (or a new function) and be baffled at the resulting non-local exit, because I'd forgotten to decorate it with inlineCallbacks.

With decoratedFunction.returnValue, if you move some code that returns a value between two functions that both use inlineCallbacks, if bar calls foo.returnValue, you still get a wacky non-local exit that doesn't point at the foo.returnValue line.

comment:4 Changed 9 years ago by Alex Gaynor

Cc: Alex Gaynor added

Glyph: that case can easily be handled: all of the returnValue exceptions have a common parent, if one is raised through inlineCallbacks which is not the correct one, that's an error.

comment:5 Changed 9 years ago by Jean-Paul Calderone

There should be a benchmark for this.

comment:6 Changed 9 years ago by Jean-Paul Calderone

Glyph: that case can easily be handled: all of the returnValue exceptions have a common parent, if one is raised through inlineCallbacks which is not the correct one, that's an error

It will be the correct one (at least sometimes), because decoratedFunction is the one decorated with inlineCallbacks, therefore decoratedFunction.returnValue will raise the correct exception to return from decoratedFunction, even if you use it from any other function that is not decoratedFunction (but is called by decoratedFunction).

If you refactored the decoratedFunction.returnValue-using code into a new function that is called by multiple different functions decorated with inlineCallbacks, then the error would only pass silently in some cases, but that's still worse than the current behavior.

comment:7 Changed 9 years ago by Alex Gaynor

Hmm, good point. I'm not sure there's a way to do it along this patch (except maybe rewriting the function's bytecode so that the returnValue reference is really only visible locally). That's obviously completely awful so twisted shouldn't do it.

comment:8 Changed 9 years ago by Jean-Paul Calderone

That's obviously completely awful so twisted shouldn't do it.

Thanks for the sanity. :)

comment:9 in reply to:  8 Changed 9 years ago by Glyph

Replying to exarkun:

That's obviously completely awful so twisted shouldn't do it.

Thanks for the sanity. :)

I think there are ways to do this without rewriting bytecode, but I suppose I'll have to wait to make any stronger assertions until I've had time to cook up a prototype.

comment:10 Changed 9 years ago by Jean-Paul Calderone

I think there are ways to do this without rewriting bytecode, but I suppose I'll have to wait to make any stronger assertions until I've had time to cook up a prototype.

Write a benchmark instead! Maybe using sys.exc_info() isn't actually a noticeable performance hit at all and optimizing this is just a waste of time?

comment:11 Changed 9 years ago by fijal

While normally I would be the first one to complain about a lack of benchmarks, usage of sys._getframe or sys.exc_info on pypy is a bit out of question. It makes stuff 10x+ slower everywhere around the usage (so the callstack where you fish too) and not just in the place you use it.

On the other hand, I have a hope that my branch will make things better. I'm working on a branch that keeps optimized assembler frames on the heap instead of processor stack. The benefit is that we can keep them around for longer, so storing them is not such a big deal. We would also need a way to relatively easy read data out of them, so this is another bit of work.

However, I would like to point out that this usage is a bit hackish and does not always work. For example if you're inside a callback from C (say via ctypes) tb_next will be None (or if you're one down tb_next.tb_next will be None) and you won't see the warning anyway. So having a saner solution (which I can't really think about, maybe some exec?) would make it work better too.

comment:12 in reply to:  11 Changed 9 years ago by Glyph

Replying to fijal:

While normally I would be the first one to complain about a lack of benchmarks, usage of sys._getframe or sys.exc_info on pypy is a bit out of question. It makes stuff 10x+ slower everywhere around the usage (so the callstack where you fish too) and not just in the place you use it.

Nevertheless, we do also care about CPython performance, so we ought to have a benchmark that runs on both.

On the other hand, I have a hope that my branch will make things better. I'm working on a branch that keeps optimized assembler frames on the heap instead of processor stack. The benefit is that we can keep them around for longer, so storing them is not such a big deal. We would also need a way to relatively easy read data out of them, so this is another bit of work.

... and this is yet another good reason to have a benchmark, to see how much of a difference this optimization makes to this particular use-case when you commit it :).

However, I would like to point out that this usage is a bit hackish and does not always work. For example if you're inside a callback from C (say via ctypes) tb_next will be None (or if you're one down tb_next.tb_next will be None) and you won't see the warning anyway. So having a saner solution (which I can't really think about, maybe some exec?) would make it work better too.

Hmm. This is an interesting edge case, although it seems pretty unlikely that someone would be trying to emulate inlineCallbacks or calling returnValue from C. Maybe, uh, a mis-paste into some Cython code? I've never seen this problem, so it's not a super high priority to fix, I don't think. Can you provide a more detailed example of how this might happen?

comment:13 Changed 9 years ago by Glyph

Note: See TracTickets for help on using tickets.