Opened 4 years ago

Last modified 22 months ago

#5126 enhancement new

Add http cache support to Agent

Reported by: chris- Owned by: therve
Priority: normal Milestone:
Component: web Keywords:
Cc: jknight, tom.most@… Branch: branches/caching-agent-5126-2
(diff, github, buildbot, log)
Author: therve Launchpad Bug:

Description

The web.Agent should support satisfying requests from a (local) cache using the HTTP caching headers.
This patch starts adding support for this using the etag / last-modified / if-modified-since / if-match / if-none-match headers. I intend to add features /headers as needed, but these should cover the most useful.

I would appreciate some pointers.

Attachments (1)

caching_agent.patch (13.4 KB) - added by chris- 4 years ago.

Download all attachments as: .zip

Change History (19)

comment:1 Changed 4 years ago by DefaultCC Plugin

  • Cc jknight added

Changed 4 years ago by chris-

comment:2 Changed 4 years ago by therve

  • Owner set to therve

comment:3 Changed 4 years ago by therve

  • Author set to therve
  • Branch set to branches/caching-agent-5126

(In [32010]) Branching to 'caching-agent-5126'

comment:4 Changed 4 years ago by therve

(In [32011]) Apply patch

Refs #5126

comment:5 Changed 4 years ago by therve

(In [32012]) Cosmetics

Refs #5126

comment:6 Changed 4 years ago by itamar

The current cache API is problematic, since it assumes cached content will fit in memory, or even won't use a lot of memory. What happens when you get cache headers for a 500MB file?

comment:7 Changed 4 years ago by therve

  • Keywords review removed
  • Owner changed from therve to chris-

Thanks a lot for the patch, it looks very promising. Meta-comment: there were a lot of small aesthetic changes to do: please read the coding standard more carefully, I won't do the changes myself the next time :).

  1. All your test methods need docstrings.
  1. As dumb as it is, you should have some specific testing for MemoryCache.
  1. I've been wondering if the methods of IHTTPCache shouldn't return Deferred instead. It would allow to integrate easily with the rest of Twisted protocols (Memcache for example).
  1. The call to delete the cache is not strictly necessary, as you call put just after.
  1. _CachingProtocol should use a list or a StringIO for buffer, not a string.
  1. It feels that cache entries should be more structured than a simple dict.
  1. As Itamar mentioned, the fact that "content" is a simple string is problematic. I'm not particularly inspired, but I think that using a file-like object should fix it.

comment:8 Changed 4 years ago by exarkun

(In [32014]) Spelling, whitespace; interface methods do not have self.

refs #5126

comment:9 Changed 3 years ago by therve

#612 was closed as a duplicate of this one.

comment:10 Changed 22 months ago by therve

  • Owner changed from chris- to therve

comment:11 Changed 22 months ago by therve

  • Branch changed from branches/caching-agent-5126 to branches/caching-agent-5126-2

(In [37440]) Branching to 'caching-agent-5126-2'

comment:12 Changed 22 months ago by therve

  • Keywords review added
  • Owner therve deleted

OK, this is ready for another round of review. Most of my changes were against the cache interface: I made get return a Deferred, removed put/delete, and made content management more callbacks based so that it would be more flexible.

The content is now managed by an external protocol cache. One thing I'm wondering is whether the interface between the cache and its protocol should formalized, the current branch making it up to the implementation.

comment:13 Changed 22 months ago by tom.prince

  • Keywords review removed
  • Owner set to therve
  1. test_agent.MemoryCacheAgent
    • missing docstring
    • MemoryCacheAgentTests?
    • dataReceived without put seems odd
    • self.assertIdentical(None, self.successResultOf(protocol.made))
      

can be

Error: Failed to load processor pythonS
No macro or processor named 'pythonS' found

successResultOf is probably new enough that the definitive style isn't determined. But the assertion is testing a detail of SimpleAgentProtocol that isn't relevant here.

  1. IHTTPCache
    • The interface seems too narrow. There is no way to tell the cache that the body of an entry is complete.
    • The use of dataReceived and deliverBody with unusual signatures feels strange to me. But it is perhaps reasonable. The other option might be to have get and put return something that had those methods (with their usual signature), but that seems like it would require more code for little or now benefit.
    • It lacks a way to remove cached entries/bodies. (Incidentally, instead of updating the cache with new contents, it concatenates all the cached responses.
  2. Once the body has been cached, we can store the length.
  3. I haven't looked at the relevant RFCs, but should we be using the cached headers, with the cached response, or is the server required to generate all the appropriate response headers, if we have a cache hit.

comment:14 Changed 22 months ago by tom.prince

self.successResultOf(protocol.made)

comment:15 Changed 22 months ago by twm

  • Cc tom.most@… added

comment:16 Changed 22 months ago by tom.prince

(apperntly, I didn't review the tip of the branch)

  • 1.3 and 2.1 is fixed
  1. The parenthetical in 2.3 is fixed (but there are no tests of this)
  2. Maybe use a tuple or dict, instead of having parallel _storage and _cache dicts.
  3. getProtocol is confusingly named, since it is used to put a body.
  4. Is the url enough, as a key? I would think that Accept-* headers, among other things, will affect the response. What do RFCs and proxies/browsers do about this?
  5. The ticket description mentions adding other features/headers. Is IHTTPCache enough for those? I think the answer to 8 will answer this question as well.
  6. Might a client want some way to find out when/whether a cache entry has successfully been written? This is perhaps outside the scope of IHTTPCache.

comment:17 Changed 22 months ago by tom.prince

<glyph> tomprince: contentStorageProtocol?        
<glyph> Needs a better docstring, too.  "Manage received content"?

comment:18 Changed 22 months ago by therve

HTTP/1.1 specifies the Vary header which seems to help caches differentiate how to store the response. But my understanding is that it's mostly useful if you write a complete cache, ie not asking the server for 304 response for relying on expire fields, which we don't. In our case, the different headers would generate a different response, so at worse we would uselessly cache responses that wouldn't be used if you change accept headers between requests.

Note: See TracTickets for help on using tickets.