Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Caddy 2 caching layer and API endpoints #2820

Closed
lpellegr opened this issue Oct 18, 2019 · 9 comments
Closed

Caddy 2 caching layer and API endpoints #2820

lpellegr opened this issue Oct 18, 2019 · 9 comments
Labels
feature ⚙️ New feature or request help wanted 🆘 Extra attention is needed
Milestone

Comments

@lpellegr
Copy link

lpellegr commented Oct 18, 2019

1. What would you like to have changed?

The Caddy 2 caching layer is currently a WIP. It seems to be the right time to suggest a cache invalidation API that supports cache tags.

2. Why is this feature a useful, necessary, and/or important addition to this project?

A caching layer improves throughput by reducing bandwidth, latency, and workload. It can apply to less frequently updated content such as a blog or more critical components such as an API. The on-demand TLS feature coupled with a great caching and invalidation API would be a set of killer features for SaaS companies and I think would bring lot of new adopters.

There exist 3 main methods regarding cache invalidation:

  1. By URL: requires a map data structure to identify matching entries to purge.
  2. By URL and recursive: requires a prefix tree.
  3. By surrogate key/cache tag: requires a multimap.

The third method is the most interesting since it enables and supports basic to advanced use cases while not requiring a lot of extra work.

It would be really convenient to have a REST API exposed by Caddy for cache invalidation. Requests could be formulated using the commonly accepted HTTP PURGE method.

3. What alternatives are there, or what are you doing in the meantime to work around the lack of this feature?

Using a solution other than Caddy.

4. Please link to any relevant issues, pull requests, or other discussions.

Here are some resources for an introduction about surrogate keys/cache tags:

@lpellegr lpellegr added the feature ⚙️ New feature or request label Oct 18, 2019
@mholt mholt added the v2 label Oct 18, 2019
@mholt mholt added this to the 2.0 milestone Oct 18, 2019
@mholt mholt added the help wanted 🆘 Extra attention is needed label Oct 18, 2019
@mholt
Copy link
Member

mholt commented Oct 18, 2019

Yep, this is a good idea, it definitely needs to be a part of Caddy's cache module.

Right now, the cache key is just the request URI:

// TODO: groupcache has no explicit cache eviction, so we need to embed
// all information related to expiring cache entries into the key; right
// now we just use the request URI as a proof-of-concept
key := r.RequestURI
(keep in mind that the current implementation is just a PoC)

Since groupcache has no explicit cache invalidation features, all we need to do is encode the information related to cache expiration into the key.

And yes, an admin endpoint would be needed. These are pluggable, as admin endpoints are themselves Caddy modules:

caddy/dynamicconfig.go

Lines 29 to 54 in 19e834c

func init() {
RegisterModule(router{})
}
type router []AdminRoute
// CaddyModule returns the Caddy module information.
func (router) CaddyModule() ModuleInfo {
return ModuleInfo{
Name: "admin.routers.dynamic_config",
New: func() Module {
return router{
{
Pattern: "/" + rawConfigKey + "/",
Handler: http.HandlerFunc(handleConfig),
},
{
Pattern: "/id/",
Handler: http.HandlerFunc(handleConfigID),
},
}
},
}
}
func (r router) Routes() []AdminRoute { return r }

The on-demand TLS feature coupled with a great caching and invalidation API would be a set of killer features for SaaS companies and I think would bring lot of new adopters.

Agreed! We've talked to some CDNs who would go crazy for this combination.

Anyone want to post an end-to-end example of something being added to the cache (including what the key is), an API request that invalidates it, and then the next version being added to the cache and used in its place?

@mholt mholt changed the title Caddy 2 cache invalidation API Caddy 2 caching layer and API endpoints Oct 18, 2019
@mholt
Copy link
Member

mholt commented Oct 18, 2019

Also looping @maruel in on this conversation

@maruel
Copy link

maruel commented Oct 19, 2019

My 2¢ from having worked on fairly distributed systems.

For scalable infrastructures, requirement for cache invalidation is generally an anti-pattern. I'd recommend to use on of the following in the key:

  • time (like rounding at the hour for data that can accept various levels of staleness).
  • "generation" value, which gets bumped every time the data from a previous generation should be discarded.

Using these patterns remove the need for cache coherence between nodes in the caching infrastructure. A good example of what failed for us:

  • We use memcache to cache DB entries, a few billions RPCs per day.
  • Once every ~0.001%, a memcache write RPC fails, like PUT or DELETE.
  • When it happens, the object in cache becomes stale, causing all sorts of issues.

Sure you could use TTL (my preference), and/or explicit cache invalidation to work around those once they are detected, but the whole system becomes eventually consistent; especially that all the nodes in the system must have a coherent view of the cache data.

@lpellegr
Copy link
Author

@maruel Interesting thoughts and experience.

My original message did not mention TTL based eviction since it seems to be a mandatory feature to support HTTP Cache-control headers on an HTTP server. Do you plan to rely on the cache module to handle Cache-control headers?

What about a basic abstraction layer (maybe one for eviction policies and one for invalidation) and different implementations (e.g. a local disk-based, local mem-based, a distributed one). Not all use cases require a distributed setup, neither coordination between nodes. It would be interesting to get more feedback from Caddy users and their use cases. Maybe you already have?

Just to emphasize that my main motivation to create this issue was a cache invalidation API with modern actionable purge methods (i.e. surrogate keys). That's a really common and recurrent need in order to work with multi-tenant architectures in the SaaS world.

@maruel
Copy link

maruel commented Oct 19, 2019

Even supporting max-age correctly is tricky: How long should the proxy cache it? Should the max-age value be reduced as time goes?

I'm not familiar with proxy implementations but I tend to favor safety over performance: cache for a small amount of time (like 1% of the cache max-age statement) so we don't need to answer the question above.

@mholt mholt removed the v2 label Mar 23, 2020
@varun06
Copy link

varun06 commented Mar 26, 2020

Not sure if you folks have already looked at https://github.com/mailgun/groupcache

It is fork of groupcache and support TTL based eviction and a way to remove(sort of).

@mholt
Copy link
Member

mholt commented Mar 26, 2020

@varun06 I did see that once upon a time, but the theoretical guarantees are less strong. I think if we went that route we'd have to do some careful profiling and benchmarking before we decide to go with it. It has to be better than simply encoding expiration into the keys themselves. Also, this issue is a little unsettling (a common gotcha that I've done myself, and something easy to fix, but let's see what the response time is): mailgun/groupcache#14

In the meantime, I've moved the cache handler to a separate repo for further development: https://github.com/caddyserver/cache-handler

I might transfer this issue as well. (or just link to it)

@varun06
Copy link

varun06 commented Mar 27, 2020

Thanks for the explanation @mholt. I am working on a similar exercise(Distributed Cache, although we offload much of caching to CDN layer) for reverse proxy used by a big retailer. It is in very initial phase. If I learn something that might help caddy, I will contribute or let you folks know.

@mholt
Copy link
Member

mholt commented Apr 1, 2020

Moving discussion to new repo dedicated for the caching layer: caddyserver/cache-handler#1

Would be happy to have people work on it!

@mholt mholt closed this as completed Apr 1, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature ⚙️ New feature or request help wanted 🆘 Extra attention is needed
Projects
None yet
Development

No branches or pull requests

4 participants