[Twisted-Python] Waiting for a contended resource

Mon Mar 12 13:36:06 MDT 2018

Hi Richard,
On March 12, 2018 at 1:49:41 PM, Richard van der Hoff (richard at matrix.org) wrote:

Hi folks,

I thought I'd poll the list on the best way to approach a problem in 
Twisted.

The background is that we have a number of resources which can be 
requested by a REST client, and which are calculated on demand. The 
calculation is moderately expensive (can take multiple seconds), so the 
results of the calculation are cached so multiple lookups of the same 
resource are more efficient.

The problem comes in trying to handle multiple clients requesting the 
same resource at once. Obviously if 200 clients all request the same 
resource at the same time, we don't want to fire off 200 calculation 
requests.

The approach we adopted was, effectively, to maintain a lock for each 
resource:

> lock = defer.DeferredLock()
> cached_result = None
>
> @defer.inlineCallbacks
> def getResource():
> yield lock.acquire()
> try:
> if cached_result is None:
> cached_result = yield do_expensive_calculation()
> defer.returnValue(cached_result)
> finally:
> lock.release()

(Of course one can optimise the above to avoid getting the lock if we 
already have the cached result - I've omitted that for simplicity.)

That's all very well, but it falls down when we get more than about 200 
requests for the same resource: once the calculation completes, we can 
suddenly serve all the requests, and the Deferreds returned by 
DeferredLock end up chaining together in a way that overflows the stack.

I reported this as http://twistedmatrix.com/trac/ticket/9304 and, at the 
time, worked around it by adding a call to reactor.callLater(0) into our 
implementation. However, Jean-Paul's comments on that bug implied that 
we were approaching the problem in completely the wrong way, and instead 
we should be avoiding queuing up work like this in the first place.

You mention using callLater to solve this problem, so I’m guessing that instead of using a lock you are re-scheduling the call to getResource if there is no cached_result value.  I’ve used this solution plenty of times across multiple projects, and have found it both simple and reliable.  Is there some reason why this solution is not desirable in your case?

It's worth reiterating that the requests arrive from REST clients which 
we have no direct control over. We *could* keep track of the number of 
waiting clients, and make the API respond with a 5xx error or similar if 
that number gets too high, with the expectation that the client retries 
- but one concern would be that the load from the additional HTTP 
traffic would outweigh any efficiency gained by not stacking up Deferreds.

Have you validated this concern through load-testing?  You may find that there is no meaningful negative impact to this approach.

So, I'd welcome any advice on better ways to approach the problem.

Richard
Hope this helps,

L. Daniel Burr
-------------- next part --------------
An HTML attachment was scrubbed...
URL: </pipermail/twisted-python/attachments/20180312/d98197a3/attachment-0002.html>