Opened 6 years ago

Last modified 4 years ago

#5126 enhancement new

Add http cache support to Agent

Reported by: Christian Kampka Owned by: therve
Priority: normal Milestone:
Component: web Keywords:
Cc: jknight, Tom Most Branch: branches/caching-agent-5126-2
branch-diff, diff-cov, branch-cov, buildbot
Author: therve


The web.Agent should support satisfying requests from a (local) cache using the HTTP caching headers. This patch starts adding support for this using the etag / last-modified / if-modified-since / if-match / if-none-match headers. I intend to add features /headers as needed, but these should cover the most useful.

I would appreciate some pointers.

Attachments (1)

caching_agent.patch (13.4 KB) - added by Christian Kampka 6 years ago.

Download all attachments as: .zip

Change History (19)

comment:1 Changed 6 years ago by DefaultCC Plugin

Cc: jknight added

Changed 6 years ago by Christian Kampka

Attachment: caching_agent.patch added

comment:2 Changed 6 years ago by therve

Owner: set to therve

comment:3 Changed 6 years ago by therve

Author: therve
Branch: branches/caching-agent-5126

(In [32010]) Branching to 'caching-agent-5126'

comment:4 Changed 6 years ago by therve

(In [32011]) Apply patch

Refs #5126

comment:5 Changed 6 years ago by therve

(In [32012]) Cosmetics

Refs #5126

comment:6 Changed 6 years ago by Itamar Turner-Trauring

The current cache API is problematic, since it assumes cached content will fit in memory, or even won't use a lot of memory. What happens when you get cache headers for a 500MB file?

comment:7 Changed 6 years ago by therve

Keywords: review removed
Owner: changed from therve to Christian Kampka

Thanks a lot for the patch, it looks very promising. Meta-comment: there were a lot of small aesthetic changes to do: please read the coding standard more carefully, I won't do the changes myself the next time :).

  1. All your test methods need docstrings.
  1. As dumb as it is, you should have some specific testing for MemoryCache.
  1. I've been wondering if the methods of IHTTPCache shouldn't return Deferred instead. It would allow to integrate easily with the rest of Twisted protocols (Memcache for example).
  1. The call to delete the cache is not strictly necessary, as you call put just after.
  1. _CachingProtocol should use a list or a StringIO for buffer, not a string.
  1. It feels that cache entries should be more structured than a simple dict.
  1. As Itamar mentioned, the fact that "content" is a simple string is problematic. I'm not particularly inspired, but I think that using a file-like object should fix it.

comment:8 Changed 6 years ago by Jean-Paul Calderone

(In [32014]) Spelling, whitespace; interface methods do not have self.

refs #5126

comment:9 Changed 6 years ago by therve

#612 was closed as a duplicate of this one.

comment:10 Changed 4 years ago by therve

Owner: changed from Christian Kampka to therve

comment:11 Changed 4 years ago by therve

Branch: branches/caching-agent-5126branches/caching-agent-5126-2

(In [37440]) Branching to 'caching-agent-5126-2'

comment:12 Changed 4 years ago by therve

Keywords: review added
Owner: therve deleted

OK, this is ready for another round of review. Most of my changes were against the cache interface: I made get return a Deferred, removed put/delete, and made content management more callbacks based so that it would be more flexible.

The content is now managed by an external protocol cache. One thing I'm wondering is whether the interface between the cache and its protocol should formalized, the current branch making it up to the implementation.

comment:13 Changed 4 years ago by Tom Prince

Keywords: review removed
Owner: set to therve
  1. test_agent.MemoryCacheAgent
    • missing docstring
    • MemoryCacheAgentTests?
    • dataReceived without put seems odd
    • self.assertIdentical(None, self.successResultOf(protocol.made))

can be

Error: Failed to load processor pythonS
No macro or processor named 'pythonS' found

successResultOf is probably new enough that the definitive style isn't determined. But the assertion is testing a detail of SimpleAgentProtocol that isn't relevant here.

  1. IHTTPCache
    • The interface seems too narrow. There is no way to tell the cache that the body of an entry is complete.
    • The use of dataReceived and deliverBody with unusual signatures feels strange to me. But it is perhaps reasonable. The other option might be to have get and put return something that had those methods (with their usual signature), but that seems like it would require more code for little or now benefit.
    • It lacks a way to remove cached entries/bodies. (Incidentally, instead of updating the cache with new contents, it concatenates all the cached responses.
  2. Once the body has been cached, we can store the length.
  3. I haven't looked at the relevant RFCs, but should we be using the cached headers, with the cached response, or is the server required to generate all the appropriate response headers, if we have a cache hit.

comment:14 Changed 4 years ago by Tom Prince


comment:15 Changed 4 years ago by Tom Most

Cc: Tom Most added

comment:16 Changed 4 years ago by Tom Prince

(apperntly, I didn't review the tip of the branch)

  • 1.3 and 2.1 is fixed
  1. The parenthetical in 2.3 is fixed (but there are no tests of this)
  2. Maybe use a tuple or dict, instead of having parallel _storage and _cache dicts.
  3. getProtocol is confusingly named, since it is used to put a body.
  4. Is the url enough, as a key? I would think that Accept-* headers, among other things, will affect the response. What do RFCs and proxies/browsers do about this?
  5. The ticket description mentions adding other features/headers. Is IHTTPCache enough for those? I think the answer to 8 will answer this question as well.
  6. Might a client want some way to find out when/whether a cache entry has successfully been written? This is perhaps outside the scope of IHTTPCache.

comment:17 Changed 4 years ago by Tom Prince

<glyph> tomprince: contentStorageProtocol?        
<glyph> Needs a better docstring, too.  "Manage received content"?

comment:18 Changed 4 years ago by therve

HTTP/1.1 specifies the Vary header which seems to help caches differentiate how to store the response. But my understanding is that it's mostly useful if you write a complete cache, ie not asking the server for 304 response for relying on expire fields, which we don't. In our case, the different headers would generate a different response, so at worse we would uselessly cache responses that wouldn't be used if you change accept headers between requests.

Note: See TracTickets for help on using tickets.