[Twisted-Python] Waiting for a contended resource

Mon Mar 12 13:52:43 MDT 2018

Hi, Richard,

I've used class like this to cache the result of Expensive Calculation:

class DeferredCache:
    pending = None
    result = None
    failure = None

    def __init__(self, expensive_func):
        self.expensive_func = expensive_func

    def __call__(self):
        if self.pending is None:
            def on_ready(result):
                self.result = result
            def on_fail(failure):
                self.failure = failure

            self.pending =
defer.maybeDeferred(self.expensive_func).addCallbacks(on_ready, on_fail)

        return self.pending.addCallback(self._return_result)

    def _return_result(self, _):
        return self.failure or self.result

Using it you can get rid of DeferredLocks:

    deferred_cache = DeferredCache(do_expensive_calculation)

    def getResource():
        return deferred_cache()

It will start `expensive_func` on the first call. The second and
consequtive calls will return deferreds that resolves with the result when
expensive_func is done. If you call it when result is already here, it will
return alread-fired deferred.

Of course, it will require some more work if you need to pass arguments to
`expensive_func` and memoize results per arguments values.

-- ilya

пн, 12 мар. 2018 г. в 22:38, L. Daniel Burr <ldanielburr at me.com>:

> Hi Richard,
>
> On March 12, 2018 at 1:49:41 PM, Richard van der Hoff (richard at matrix.org)
> wrote:
>
> Hi folks,
>
> I thought I'd poll the list on the best way to approach a problem in
> Twisted.
>
> The background is that we have a number of resources which can be
> requested by a REST client, and which are calculated on demand. The
> calculation is moderately expensive (can take multiple seconds), so the
> results of the calculation are cached so multiple lookups of the same
> resource are more efficient.
>
> The problem comes in trying to handle multiple clients requesting the
> same resource at once. Obviously if 200 clients all request the same
> resource at the same time, we don't want to fire off 200 calculation
> requests.
>
> The approach we adopted was, effectively, to maintain a lock for each
> resource:
>
> > lock = defer.DeferredLock()
> > cached_result = None
> >
> > @defer.inlineCallbacks
> > def getResource():
> > yield lock.acquire()
> > try:
> > if cached_result is None:
> > cached_result = yield do_expensive_calculation()
> > defer.returnValue(cached_result)
> > finally:
> > lock.release()
>
> (Of course one can optimise the above to avoid getting the lock if we
> already have the cached result - I've omitted that for simplicity.)
>
> That's all very well, but it falls down when we get more than about 200
> requests for the same resource: once the calculation completes, we can
> suddenly serve all the requests, and the Deferreds returned by
> DeferredLock end up chaining together in a way that overflows the stack.
>
> I reported this as http://twistedmatrix.com/trac/ticket/9304 and, at the
> time, worked around it by adding a call to reactor.callLater(0) into our
> implementation. However, Jean-Paul's comments on that bug implied that
> we were approaching the problem in completely the wrong way, and instead
> we should be avoiding queuing up work like this in the first place.
>
>
> You mention using callLater to solve this problem, so I’m guessing that
> instead of using a lock you are re-scheduling the call to getResource if
> there is no cached_result value.  I’ve used this solution plenty of times
> across multiple projects, and have found it both simple and reliable.  Is
> there some reason why this solution is not desirable in your case?
>
> It's worth reiterating that the requests arrive from REST clients which
> we have no direct control over. We *could* keep track of the number of
> waiting clients, and make the API respond with a 5xx error or similar if
> that number gets too high, with the expectation that the client retries
> - but one concern would be that the load from the additional HTTP
> traffic would outweigh any efficiency gained by not stacking up Deferreds.
>
>
> Have you validated this concern through load-testing?  You may find that
> there is no meaningful negative impact to this approach.
>
> So, I'd welcome any advice on better ways to approach the problem.
>
> Richard
>
> Hope this helps,
>
> L. Daniel Burr
> _______________________________________________
> Twisted-Python mailing list
> Twisted-Python at twistedmatrix.com
> https://twistedmatrix.com/cgi-bin/mailman/listinfo/twisted-python
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: </pipermail/twisted-python/attachments/20180312/55985d17/attachment-0002.html>