Skip to content

Cache support #105

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
vicb opened this issue Oct 15, 2024 · 19 comments · Fixed by #320
Closed

Cache support #105

vicb opened this issue Oct 15, 2024 · 19 comments · Fixed by #320
Assignees

Comments

@vicb
Copy link
Contributor

vicb commented Oct 15, 2024

Cache support.

This is a big chunk of work that should be split into multiple sub-tasks when we start the work.

References:

AWS uses multiple components for the cache:

  • a S3 storage for the content
  • dynamoDB for the tags
  • a queue to serialize & dedup access

This task is about providing a cache (incremental + tag) on the cloudflare infrastructure.

We have started to explore multiple possibles solutions using:

  • KV
  • Queues
  • Durable Object

TODO:

Support for Cache Invalidation by Tag (#91)

--

Second thing we should tackle with this issue is errors when running i.e. the middleware example on the experimental branch with browser cache enabled. fixed in opennextjs/opennextjs-aws#683

--

Edit the main branch has basic cache support (kv + assets).
It's missing revalidation

@vicb vicb converted this from a draft issue Oct 15, 2024
@conico974
Copy link
Collaborator

If you plan on reusing the aws cache handler as it is i can give some advice.
The 3 component we have can easily be overrided

S3 storage

For the S3 storage ( the incremental cache and the fetch cache ) it should be the easiest and the obvious choice seems to be KV.
Here is the aws implementation for context https://github.com/opennextjs/opennextjs-aws/blob/main/packages/open-next/src/cache/incremental/s3.ts.
You'll just need to provide something that implement the IncrementalCache interface
The cache also needs to be populated at build time

Queue

The role of the queue is for ISR revalidation, when a page is marked as stale, the queue component will be called.
For this one there is an easy and a more complex but better implementation.

For the easy one, you don't need any other component you just do exactly the same thing as in the run locally example for the dev/queue.ts component : https://opennext.js.org/aws/contribute/local_run
This solution has a drawback, it doesn't handle deduplication. Multiple user reaching a stale page would trigger multiple revalidation

For the more complex solution, i'm not sure, but it seems that durable object could be a way to go
The message that you get include a MessageDeduplicationId, it is computed in a way that there is a finite number of id (defined by this env variable MAX_REVALIDATE_CONCURRENCY, default is 10) and every route will always output the same id.
This could allow to spawn MAX_REVALIDATE_CONCURRENCY durable object that would then need to handle deduplication and trigger the revalidation in the same way as the easy one.
Not sure if a cloudflare queue could be used this way.

Tag cache (DynamoDB)

It is only used for revalidateTag or revalidatePath

Really not sure about this one, it is heavily geared toward DynamoDB to take full advantage of it.
If used as it is now the only component that i could see working for this would be D1.
For reference here is the default aws implementation : https://github.com/opennextjs/opennextjs-aws/blob/main/packages/open-next/src/cache/tag/dynamodb.ts
It also needs to be prepopulated at build time

I also have plan to offer an alternative way to query for the tag cache (that is closer to the next implementation in the default handler), but that means having to query for a bunch of tags for every get request

These 3 are also used by the experimental cache interceptor that avoid having to reach the server to serve ISR/SSG route.

@vicb
Copy link
Contributor Author

vicb commented Oct 17, 2024

Thank you so much for the super valuable info on those github issues/tasks.
❤️

@IgorMinar
Copy link
Contributor

I suspect that the easiest way to go about this is to use KV for storage and serving. In terms of latency, it's on par with the Workers Cache API, but unlike the Cache API, KV is globally distributed.

Queue can be used to dedupe revalidation requests so that we don't overwhelm the DB.

DO is likely not a great fit here as it could get overwhelmed by the thundering herd when the cache is empty.

One more thing to figure out is the tag based invalidation. We might need some kind of tag to paths mapping to support that.

There is one more thing that I don't see mentioned here at all. This issue is about serving cache, but Next also has a separate concept of a data cache, which is used to cache data requests. @dario-piotrowicz and @james-elicx spent built out a solution for that in next-on-pages: https://github.com/cloudflare/next-on-pages/blob/main/packages/next-on-pages/docs/caching.md Is this cache already supported or do we need to create a separate tracking issue for it?

@vicb
Copy link
Contributor Author

vicb commented Dec 10, 2024

but Next also has a separate concept of a data cache, which is used to cache data requests

The Next fetch cache shares the same infra - there are different value types in a common store.

@ha1fstack
Copy link
Contributor

Can the worker cache api be also supported too? KV is great but I would use worker cache if I don't need global distribution since it costs less.

@IgorMinar
Copy link
Contributor

yes, ideally the implementation is replaceable.

we could even follow the precedent in next-on-pages and provide workers cache implementation by default, and offer kv+queue as a more scalable alternative as an opt-in.

@conico974
Copy link
Collaborator

About the cache api, there might be some issues with fallback:false types of route as they need to be present at runtime. (If not next will throw an exception and there is no easy way to bypass this behaviour)
Another thing about using the cache api is that (unless i'm mistaken and there is a way to preload the local cache everywhere) page will have to be rerendered on every cache miss (even for SSG route)

We might need some kind of tag to paths mapping to support that.

OpenNext already provides that, but for it to work properly, the tag cache needs to be initialized at build time.
There is a function that can do that automatically if properly setup

@james-elicx
Copy link
Collaborator

About the cache api, there might be some issues with fallback:false types of route as they need to be present at runtime. (If not next will throw an exception and there is no easy way to bypass this behaviour) Another thing about using the cache api is that (unless i'm mistaken and there is a way to preload the local cache everywhere) page will have to be rerendered on every cache miss (even for SSG route)

The seed files don't necessarily have to be put into the cache, they could be stored as part of the deployment instead and be served from there internally - similar to what is currently done in the main branch (it was removed from experimental branch). Then it would be a bit more friendly for cache stores that aren't as easy to seed.

Unsure how that would work with what the aws adaptor has as I haven't really looked at it that much since Victor is leading that.

@conico974
Copy link
Collaborator

The seed files don't necessarily have to be put into the cache, they could be stored as part of the deployment instead and be served from there internally - similar to what is currently done in the main branch (it was removed from experimental branch)

So if the file are not there in the cache, it relies on the seed files ?
Wouldn't this cause issue with ISR or On Demand revalidation then ? If a file has been revalidated in a region and requested later from another one, then you'll get the old seed file (or a different revalidated version)

I guess what would make more sense would be to separate the routing/middleware in a different worker and rely on the cache control that is set by next server for the cache api (effectively shielding most of the request with the cache api)
And with this why not use R2 instead, it's a lot cheaper than KV on read and would provide what's needed for ISR and On Demand revalidation.

@james-elicx
Copy link
Collaborator

The seed files don't necessarily have to be put into the cache, they could be stored as part of the deployment instead and be served from there internally - similar to what is currently done in the main branch (it was removed from experimental branch)

So if the file are not there in the cache, it relies on the seed files ? Wouldn't this cause issue with ISR or On Demand revalidation then ? If a file has been revalidated in a region and requested later from another one, then you'll get the old seed file (or a different revalidated version)

It tries to serve from the cache, but if there is no file in the cache, it would serve the seed file from the deployment (which would be correct if a page hasn't been revalidated before). Works in the same way, just falls back to a different location if you're not able to seed the cache.

@dario-piotrowicz
Copy link
Contributor

Note: unstable_cache is another thing that we need to make sure is supported

But I think that it just after caching is supported (based on a quick glance of its source code)

@james-elicx james-elicx linked a pull request Feb 2, 2025 that will close this issue
@james-elicx
Copy link
Collaborator

james-elicx commented Feb 2, 2025

Big thanks to conico's comment above - super helpful in figuring out where to get started with this.

I've put up two PRs to enable ISR and revalidation:

And a docs PR - opennextjs/docs#66

We'll probably need additional tickets for the following as future pieces of work as I'm not considering them as part of this one:

  • Cloudflare Queues adapter for the queue.
  • Cache API adapter for the incremental cache.
  • R2 adapter for the incremental cache.
  • Cache API adapter for the tag cache.
  • Durable object adapter for the tag cache.

@conico974
Copy link
Collaborator

The cache api alone cannot work for the incremental cache or the tag cache. At least not without a big warning that it may break or not work as expected.
Because the cache api is regional if an entry is revalidated in one place, it will not be reflected on other regions.
And if it uses the same strategy as with KV with the seed data, you may end up serving stale initial data on some region and updated one on other (and revalidating again while some region might have the correct updated data).
In a worst case scenario you could end up having as much different page data as there is cloudflare cache location.

And On Demand revalidation would require purging the cache for these values. This could help, but i'm not even sure if it's doable globally while using the cache api from workers

@james-elicx
Copy link
Collaborator

The cache api alone cannot work for the incremental cache or the tag cache. At least not without a big warning that it may break or not work as expected. Because the cache api is regional if an entry is revalidated in one place, it will not be reflected on other regions. And if it uses the same strategy as with KV with the seed data, you may end up serving stale initial data on some region and updated one on other (and revalidating again while some region might have the correct updated data). In a worst case scenario you could end up having as much different page data as there is cloudflare cache location.

And On Demand revalidation would require purging the cache for these values. This could help, but i'm not even sure if it's doable globally while using the cache api from workers

There's nothing necessarily wrong with it being regional - Vercel's data cache for data fetching is regional for what it's worth. Just as long as it's communicated to users. From what I recall, I don't think anyone ever ran into issues using the cache api in next-on-pages.

@conico974
Copy link
Collaborator

The issue is not really with it being regional, but rather that the same call from different location may return different data.
The ISR/SSG cache works in a stale-while-revalidate way, it will return stale data while revalidating in the background.

Correct me if i'm wrong, but next-on-pages doesn't use Next incremental cache for SSG/ISR route, only for the fetch cache right ?
Given how it works it's also unlikely that user will notice any issue unless they're actively looking for it. If you don't use a vpn (and you'll need to change the location while testing ) you'll likely always get data from the same region.
As an example ( i take the worst case scenario here on purpose ), let's say you have a website that's only served in one region with revalidate = 3600. Now someone access it from another region 6 months later, in this region the cache is empty, so it will serve the initial page even though it's 6 month old.
And you still have the issue of On Demand revalidation not properly working. revalidateTag could work if the tag cache does not suffer from this issue, but res.revalidate will not

With a big warning explaining this i think it's fine, but it should be made clear that this is a possibility and that it may not work as expected

@ibobo
Copy link

ibobo commented Feb 3, 2025

You're right, Vercel cache is regional, BUT revalidation is global. We already discussed this same topic for next-on-pages here. The important part is in the Vercel docs (emphasis by me):

On-demand revalidation: Any data can be triggered for revalidation on-demand, regardless of the revalidation interval. The revalidation propagates to all regions within 300ms.

Having regional cache AND regional revalidation makes it impossible to forcibly reset a cache.

@james-elicx
Copy link
Collaborator

Yeah it's a fair point and a big warning explaining it like conico mentioned makes sense. It's better giving people the option instead of having no free option IMO, but if others disagree we don't have to do it.

@conico974
Copy link
Collaborator

Technically it would be possible to make it work like on vercel with everything we got in place in OpenNext, but it kinds of defeat the purpose of a free/cheap option :

  • It would requires to use Cache tags to purge the regional cache when needed (Only available on enterprise plan)
  • Automatic CDN invalidation would need to be implemented to purge cache for On-Demand revalidation
  • The cache should be split into 2 tier (i.e. the regional cache api and a single source of truth that could be R2 for example) so that when the regional cache is empty it retrieves the correct data
  • A custom queue should be used that would trigger a purge of the regional cache once revalidation is done

@vicb vicb moved this from Todo to In Progress in opennext-cloudflare - aws "merge" Feb 5, 2025
@bookernath
Copy link

Was the option to do fetch or ISR revalidation using waitUntil() evaluated? Maybe it doesn't map nicely to next.js's existing revalidation architecture and it's potentially vulnerable to thundering herd issues, but it would be pretty easy to reason about and avoid having to use additional products.

Just a thought, feel free to toss it out if it doesn't match the vision here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging a pull request may close this issue.

8 participants