[Twisted-Python] Waiting for a contended resource

Mon Mar 12 15:31:25 MDT 2018

Thanks for correction, Jean-Paul, you're absolutly right

пн, 12 мар. 2018 г. в 23:00, Jean-Paul Calderone <exarkun at twistedmatrix.com
>:

> On Mon, Mar 12, 2018 at 3:52 PM, Ilya Skriblovsky <
> ilyaskriblovsky at gmail.com> wrote:
>
>> Hi, Richard,
>>
>> I've used class like this to cache the result of Expensive Calculation:
>>
>> class DeferredCache:
>>     pending = None
>>     result = None
>>     failure = None
>>
>>     def __init__(self, expensive_func):
>>         self.expensive_func = expensive_func
>>
>>     def __call__(self):
>>         if self.pending is None:
>>             def on_ready(result):
>>                 self.result = result
>>             def on_fail(failure):
>>                 self.failure = failure
>>
>>             self.pending =
>> defer.maybeDeferred(self.expensive_func).addCallbacks(on_ready, on_fail)
>>
>>         return self.pending.addCallback(self._return_result)
>>
>>
> This seems like basically a correct answer to me.  However, I suggest one
> small change.
>
> You probably want to create and return a new Deferred for each result.  If
> you don't, then your internal `pending` Deferred is now reachable by
> application code.
>
> As written, an application might (very, very reasonably):
>
>     d = getResource()
>     d.addCallback(long_async_operation)
>
> Now `pending` has `long_async_operation` as a callback on its chain.  This
> will prevent anyone else from getting a result until `long_async_operation`
> is done.
>
> You can fix this by:
>
>     result = Deferred()
>     self.pending.addCallback(self._return_result).chainDeferred(result)
>     return result
>
> Now the application can only reach `result`.  Nothing they do to `result`
> will make much difference to `pending` because `chainDeferred` only puts
> `callback` (and `errback`) onto `pending`'s callback chain.  `callback` and
> `errback` don't wait on anything.
>
> You have to be a little careful with `chainDeferred` because it doesn't
> have the recursion-avoidance logic that implicit chaining has.  However,
> that doesn't matter in this particular case because the chain depth is
> fixed at two (`pending` and `result`).  The problems only arise if you
> extend the chain out in this direction without bound.
>
> Jean-Paul
>
>
>
>>     def _return_result(self, _):
>>         return self.failure or self.result
>>
>> Using it you can get rid of DeferredLocks:
>>
>>     deferred_cache = DeferredCache(do_expensive_calculation)
>>
>>     def getResource():
>>         return deferred_cache()
>>
>> It will start `expensive_func` on the first call. The second and
>> consequtive calls will return deferreds that resolves with the result when
>> expensive_func is done. If you call it when result is already here, it will
>> return alread-fired deferred.
>>
>> Of course, it will require some more work if you need to pass arguments
>> to `expensive_func` and memoize results per arguments values.
>>
>> -- ilya
>>
>> пн, 12 мар. 2018 г. в 22:38, L. Daniel Burr <ldanielburr at me.com>:
>>
>>> Hi Richard,
>>>
>>> On March 12, 2018 at 1:49:41 PM, Richard van der Hoff (
>>> richard at matrix.org) wrote:
>>>
>>> Hi folks,
>>>
>>> I thought I'd poll the list on the best way to approach a problem in
>>> Twisted.
>>>
>>> The background is that we have a number of resources which can be
>>> requested by a REST client, and which are calculated on demand. The
>>> calculation is moderately expensive (can take multiple seconds), so the
>>> results of the calculation are cached so multiple lookups of the same
>>> resource are more efficient.
>>>
>>> The problem comes in trying to handle multiple clients requesting the
>>> same resource at once. Obviously if 200 clients all request the same
>>> resource at the same time, we don't want to fire off 200 calculation
>>> requests.
>>>
>>> The approach we adopted was, effectively, to maintain a lock for each
>>> resource:
>>>
>>> > lock = defer.DeferredLock()
>>> > cached_result = None
>>> >
>>> > @defer.inlineCallbacks
>>> > def getResource():
>>> > yield lock.acquire()
>>> > try:
>>> > if cached_result is None:
>>> > cached_result = yield do_expensive_calculation()
>>> > defer.returnValue(cached_result)
>>> > finally:
>>> > lock.release()
>>>
>>> (Of course one can optimise the above to avoid getting the lock if we
>>> already have the cached result - I've omitted that for simplicity.)
>>>
>>> That's all very well, but it falls down when we get more than about 200
>>> requests for the same resource: once the calculation completes, we can
>>> suddenly serve all the requests, and the Deferreds returned by
>>> DeferredLock end up chaining together in a way that overflows the stack.
>>>
>>> I reported this as http://twistedmatrix.com/trac/ticket/9304 and, at the
>>>
>>> time, worked around it by adding a call to reactor.callLater(0) into our
>>>
>>> implementation. However, Jean-Paul's comments on that bug implied that
>>> we were approaching the problem in completely the wrong way, and instead
>>>
>>> we should be avoiding queuing up work like this in the first place.
>>>
>>>
>>> You mention using callLater to solve this problem, so I’m guessing that
>>> instead of using a lock you are re-scheduling the call to getResource if
>>> there is no cached_result value.  I’ve used this solution plenty of times
>>> across multiple projects, and have found it both simple and reliable.  Is
>>> there some reason why this solution is not desirable in your case?
>>>
>>> It's worth reiterating that the requests arrive from REST clients which
>>> we have no direct control over. We *could* keep track of the number of
>>> waiting clients, and make the API respond with a 5xx error or similar if
>>>
>>> that number gets too high, with the expectation that the client retries
>>> - but one concern would be that the load from the additional HTTP
>>> traffic would outweigh any efficiency gained by not stacking up
>>> Deferreds.
>>>
>>>
>>> Have you validated this concern through load-testing?  You may find that
>>> there is no meaningful negative impact to this approach.
>>>
>>> So, I'd welcome any advice on better ways to approach the problem.
>>>
>>> Richard
>>>
>>> Hope this helps,
>>>
>>> L. Daniel Burr
>>> _______________________________________________
>>> Twisted-Python mailing list
>>> Twisted-Python at twistedmatrix.com
>>> https://twistedmatrix.com/cgi-bin/mailman/listinfo/twisted-python
>>>
>>
>> _______________________________________________
>> Twisted-Python mailing list
>> Twisted-Python at twistedmatrix.com
>> https://twistedmatrix.com/cgi-bin/mailman/listinfo/twisted-python
>>
>> _______________________________________________
> Twisted-Python mailing list
> Twisted-Python at twistedmatrix.com
> https://twistedmatrix.com/cgi-bin/mailman/listinfo/twisted-python
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: </pipermail/twisted-python/attachments/20180312/ceb06f18/attachment-0002.html>