[Twisted-Python] Waiting for a contended resource

Jean-Paul Calderone exarkun at twistedmatrix.com
Mon Mar 12 14:00:08 MDT 2018


On Mon, Mar 12, 2018 at 3:52 PM, Ilya Skriblovsky <ilyaskriblovsky at gmail.com
> wrote:

> Hi, Richard,
>
> I've used class like this to cache the result of Expensive Calculation:
>
> class DeferredCache:
>     pending = None
>     result = None
>     failure = None
>
>     def __init__(self, expensive_func):
>         self.expensive_func = expensive_func
>
>     def __call__(self):
>         if self.pending is None:
>             def on_ready(result):
>                 self.result = result
>             def on_fail(failure):
>                 self.failure = failure
>
>             self.pending = defer.maybeDeferred(self.
> expensive_func).addCallbacks(on_ready, on_fail)
>
>         return self.pending.addCallback(self._return_result)
>
>
This seems like basically a correct answer to me.  However, I suggest one
small change.

You probably want to create and return a new Deferred for each result.  If
you don't, then your internal `pending` Deferred is now reachable by
application code.

As written, an application might (very, very reasonably):

    d = getResource()
    d.addCallback(long_async_operation)

Now `pending` has `long_async_operation` as a callback on its chain.  This
will prevent anyone else from getting a result until `long_async_operation`
is done.

You can fix this by:

    result = Deferred()
    self.pending.addCallback(self._return_result).chainDeferred(result)
    return result

Now the application can only reach `result`.  Nothing they do to `result`
will make much difference to `pending` because `chainDeferred` only puts
`callback` (and `errback`) onto `pending`'s callback chain.  `callback` and
`errback` don't wait on anything.

You have to be a little careful with `chainDeferred` because it doesn't
have the recursion-avoidance logic that implicit chaining has.  However,
that doesn't matter in this particular case because the chain depth is
fixed at two (`pending` and `result`).  The problems only arise if you
extend the chain out in this direction without bound.

Jean-Paul



>     def _return_result(self, _):
>         return self.failure or self.result
>
> Using it you can get rid of DeferredLocks:
>
>     deferred_cache = DeferredCache(do_expensive_calculation)
>
>     def getResource():
>         return deferred_cache()
>
> It will start `expensive_func` on the first call. The second and
> consequtive calls will return deferreds that resolves with the result when
> expensive_func is done. If you call it when result is already here, it will
> return alread-fired deferred.
>
> Of course, it will require some more work if you need to pass arguments to
> `expensive_func` and memoize results per arguments values.
>
> -- ilya
>
> пн, 12 мар. 2018 г. в 22:38, L. Daniel Burr <ldanielburr at me.com>:
>
>> Hi Richard,
>>
>> On March 12, 2018 at 1:49:41 PM, Richard van der Hoff (richard at matrix.org)
>> wrote:
>>
>> Hi folks,
>>
>> I thought I'd poll the list on the best way to approach a problem in
>> Twisted.
>>
>> The background is that we have a number of resources which can be
>> requested by a REST client, and which are calculated on demand. The
>> calculation is moderately expensive (can take multiple seconds), so the
>> results of the calculation are cached so multiple lookups of the same
>> resource are more efficient.
>>
>> The problem comes in trying to handle multiple clients requesting the
>> same resource at once. Obviously if 200 clients all request the same
>> resource at the same time, we don't want to fire off 200 calculation
>> requests.
>>
>> The approach we adopted was, effectively, to maintain a lock for each
>> resource:
>>
>> > lock = defer.DeferredLock()
>> > cached_result = None
>> >
>> > @defer.inlineCallbacks
>> > def getResource():
>> > yield lock.acquire()
>> > try:
>> > if cached_result is None:
>> > cached_result = yield do_expensive_calculation()
>> > defer.returnValue(cached_result)
>> > finally:
>> > lock.release()
>>
>> (Of course one can optimise the above to avoid getting the lock if we
>> already have the cached result - I've omitted that for simplicity.)
>>
>> That's all very well, but it falls down when we get more than about 200
>> requests for the same resource: once the calculation completes, we can
>> suddenly serve all the requests, and the Deferreds returned by
>> DeferredLock end up chaining together in a way that overflows the stack.
>>
>> I reported this as http://twistedmatrix.com/trac/ticket/9304 and, at the
>> time, worked around it by adding a call to reactor.callLater(0) into our
>> implementation. However, Jean-Paul's comments on that bug implied that
>> we were approaching the problem in completely the wrong way, and instead
>> we should be avoiding queuing up work like this in the first place.
>>
>>
>> You mention using callLater to solve this problem, so I’m guessing that
>> instead of using a lock you are re-scheduling the call to getResource if
>> there is no cached_result value.  I’ve used this solution plenty of times
>> across multiple projects, and have found it both simple and reliable.  Is
>> there some reason why this solution is not desirable in your case?
>>
>> It's worth reiterating that the requests arrive from REST clients which
>> we have no direct control over. We *could* keep track of the number of
>> waiting clients, and make the API respond with a 5xx error or similar if
>> that number gets too high, with the expectation that the client retries
>> - but one concern would be that the load from the additional HTTP
>> traffic would outweigh any efficiency gained by not stacking up Deferreds.
>>
>>
>> Have you validated this concern through load-testing?  You may find that
>> there is no meaningful negative impact to this approach.
>>
>> So, I'd welcome any advice on better ways to approach the problem.
>>
>> Richard
>>
>> Hope this helps,
>>
>> L. Daniel Burr
>> _______________________________________________
>> Twisted-Python mailing list
>> Twisted-Python at twistedmatrix.com
>> https://twistedmatrix.com/cgi-bin/mailman/listinfo/twisted-python
>>
>
> _______________________________________________
> Twisted-Python mailing list
> Twisted-Python at twistedmatrix.com
> https://twistedmatrix.com/cgi-bin/mailman/listinfo/twisted-python
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: </pipermail/twisted-python/attachments/20180312/1cbdd763/attachment-0002.html>


More information about the Twisted-Python mailing list