[Twisted-Python] In memory cache in twisted

Thu Sep 26 21:39:59 MDT 2019

On Friday, 27 September 2019 04:38:46 CEST Waqar Khan wrote:
> Hi,
>   What's a good way to use a simple dictionary as a cache in twisted
> framework?
> Basically, I have this callback chain where I ultimately make a rest
> call (in non-blocking way using treq) to fetch some data. But before
> I make the call, I am using a dictionary to see if the value is
> available or not. But, I have noticed that the event loop gets pretty
> busy(sometimes, things get stuck and twisted server stops) as soon as
> I add this logic.. Which is pretty much
> 
> @defer.inlinecallbacks
> def fetch(key):
>       if key in cache:
>                return cache[key]
>       # else call back to treq to fetch value
>        cache[key] = value
>        return value
> 
> This dict can grow to around 50k.. What's a good way to solve this
> issue?

If it gets stuck, then the cause for that is probably in the part of the 
code you omitted. So it would help to elaborate on how the value is 
fetched exactly.

I can see two other problems with this caching mechanism though:

1. Items are never removed from the cache, so unless there is a limit to 
the number of different keys that can be used, the cache can grow 
indefinitely. You might want something like an LRU cache rather than a 
plain dictionary.

https://docs.python.org/3/library/functools.html#functools.lru_cache

2. If a lot of clients are requesting the same thing, you won't see any 
benefits from caching until the first request completes. So you could 
get a pattern like this:

T=0: key A requested, A is not cached, start fetch #1 of A
T=1: key A requested, A is not cached, start fetch #2 of A
T=2: key A requested, A is not cached, start fetch #3 of A
T=3: key A requested, A is not cached, start fetch #4 of A
T=4: key A requested, A is not cached, start fetch #5 of A
T=5: fetch #1 of A completes and is added to the cache
T=6: key A requested, A is cached, return value immediately

In this example, the value for A is fetched 5 times despite the caching  
mechanism. If the fetching takes a long time compared to the rate at 
which requests are coming in, this effect gets worse at a quadratic 
rate: the total time spent fetching is the number of requests that come 
in during the fetching of the first request times the duration of the 
fetch.

To avoid this, you could put a Deferred for the fetch operation in the 
cache or in a separate dictionary and if you get another request for the 
same key before the fetch completes, return that Deferred instead of 
starting another fetch.

Bye,
		Maarten