Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow option to ignore failures in cache #1400

Closed
dmathieu opened this issue Apr 8, 2024 · 6 comments · Fixed by #1403
Closed

Allow option to ignore failures in cache #1400

dmathieu opened this issue Apr 8, 2024 · 6 comments · Fixed by #1403
Labels
enhancement New feature or request

Comments

@dmathieu
Copy link
Contributor

dmathieu commented Apr 8, 2024

We're looking into caching our lychee results, as we're checking a lot of URLs, and we're frequently getting 429s from some services.
Unfortunately right now, lychee only skips caching for excluded, unsupported and unknown checks.
This means we can only store our cache if a run fully passed.

If there was an option such as --ignore-cache-failures, we would be able to store the cache every time, even when a run failed, but with only the success checks.
Then, a subsequent run would only check the URLs that previously failed and reduce the amount of checks we need to run.

Before looking into adding that option, I wanted to raise this issue to ensure I didn't miss anything and there wasn't already an alternative, or a good reason for this not to exist.

@mre
Copy link
Member

mre commented Apr 8, 2024

Yeah, good idea.
I wonder about the implementation part.
Maybe we should support caching by status code.
So --cache-status 429 or something.
Initially I was a bit conservative with the implementation. A 429 is a transient error, so retrying on the next run sounded like the logical thing to me. I do see your use-case, though, so perhaps overwriting the behavior would make sense.

@mre
Copy link
Member

mre commented Apr 8, 2024

Actually, reading this again, I think what you were asking for is slightly different: the cache should be stored even if we encounter 429 status codes. Is that correct?
If so, I wonder if this could be the default. I don't see any reason why we would not want to store the cache in that case. We can simply leave the 429s out and cache the rest. It would in fact actually help avoid 429s in subsequent runs.

@dmathieu
Copy link
Contributor Author

dmathieu commented Apr 8, 2024

No, the cache should not be stored if we get a 429.
When that happens, I would expect a subsequent run to skip every previous 200, and to only retry the 429s.

@mre
Copy link
Member

mre commented Apr 9, 2024

Sorry, I think that's what I meant. So we don't cache the 429s; just the rest. This way on the next run when we load the cache again, we see all the 200s and can skip them.
Does that sound correct? (Just to clarify.)

@dmathieu
Copy link
Contributor Author

dmathieu commented Apr 9, 2024

Yes, that is accurate.
Thinking about it, I think it could be extended to a few more, as (for my usage), the cache would need to ignore everything that is not actionable from the context of the repository that executes lychee.
With tha thinking, 5xx errors are not actionable, since they indicate an issue on the server. So they should also be ignored from the cache?

@mre
Copy link
Member

mre commented Apr 9, 2024

Here's the way I see it right now:

  • 429 indicates that the error is on the client-side, namely lychee made too many requests. The mitigation is to wait a bit and try again. This is a good case for excluding it from the cache, because we can expect a different response after a short cooldown period
  • 5xx indicates that the error is on the server-side. Retrying might not help unless things are fixed server-side. This is a good case for adding it to the cache, because we should not expect a different result after a short cooldown period.

In summary, 429 implies that requests can be handled again after some time.
In the case of 5xx, one could argue that they are often temporary hiccups and fixed quickly, but there is no guarantee. The question is when it's reasonable to try again, and that question is hard to answer.
The best we can do is make the cache duration configurable with --max-cache-age.

@mre mre added the enhancement New feature or request label Apr 16, 2024
@mre mre closed this as completed in #1403 Oct 14, 2024
mre pushed a commit that referenced this issue Oct 14, 2024
This introduces an option `--cache-exclude-status`, which allows specifying a range of HTTP status codes which will be ignored from the cache.

Closes #1400.
mre pushed a commit that referenced this issue Oct 27, 2024
This introduces an option `--cache-exclude-status`, which allows specifying a range of HTTP status codes which will be ignored from the cache.

Closes #1400.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants