Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unbound Serve expired; cache hit rate reducing with time #1115

Open
sirizake opened this issue Aug 1, 2024 · 7 comments
Open

Unbound Serve expired; cache hit rate reducing with time #1115

sirizake opened this issue Aug 1, 2024 · 7 comments

Comments

@sirizake
Copy link

sirizake commented Aug 1, 2024

Hi
I have installed unbound version: 1.20.0 on a FreeBSD 14 server. This was working fine until the server lost internet connectivity to the upstream internet provider. Prior to this the average cache hit rate on the server was 99.0% with only 1% recursive replies.
Part of my unbound.conf file is shown below

server:
    prefetch: yes
    serve-expired: yes
# serve-expired-ttl: 0
 # serve-expired-ttl-reset: no

After loss of internet average cache hit rate has reduced to 14% whiles recursive queries is showing 86% (still internet is not restored)
My expectation is
Caching server should continue to serve expired and keep the cache hit rate high because the serve-expired-ttl is default
(meaning it should continue serving cached content until upstream is restored).
My observation is the opposite. Is there anything I am missing? How can i ensure that the caching server will continue serving cache data several days after upstream
internet is lost
Regards
Isaac

@wcawijngaards
Copy link
Member

If this is using cachemiss or cachehits to measure it, it turns out that the code counts serve expired refresh attempts as a cachemiss. So the cachemiss counter is increased when a query comes in and gets an expired answer and then recursion takes place to refresh the data item. The counter for the number of expired answers is incremented as well. So perhaps the measurement reading is due to the statistic counters, not really the server response behaviour itself.

@wcawijngaards
Copy link
Member

[ Edited the issue post to put the config in a code block, that shows the commented out entries with #. ]

@sirizake
Copy link
Author

sirizake commented Aug 1, 2024

@wcawijngaards thanks
But is there a way to measure the server response behaviour itself in this circumstance?

@wcawijngaards
Copy link
Member

That would need a test to see if there is an expired answer at that time. That there is a recursion to refresh is not so much the problem, I would think. If the cache size was too small, also expired answers from cache could fail.

@sirizake
Copy link
Author

sirizake commented Aug 6, 2024

i see the total entries cached and the total memory cache values have remained the same with time too

@andylemin
Copy link

Hi, Adding support for this as per conversations on the email lists.
We have experienced a very similar issue when there are issues with the upstream forwarder (responding with negative responses - often due to community block-list churn or otherwise), and also when the internet has simply failed.

From our experience it appears that the forwarder/recurse/prefetch logic paths are able to poison cache records with negative responses.

We are looking for configuration options to configure Unbound to never cache negative responses. We are aware you can limit the ttl of negative caches, which provides some of this. However poisoning the cache works against the serve-stale and RFC 8767 goodness.

Will try and post some explicit examples to reproduce when I find time. Thanks again for all the great work :)

@wcawijngaards
Copy link
Member

Unbound generates negative responses from cache, using aggressive-nsec and with harden-below-nxdomain. They can be turned off. The harden-below-nxdomain regularly turns up as a problem, with own use local domains where upper labels of that receive an NXDOMAIN answer from the forwarder or recursor. Not sure if that is the issue, but perhaps worth looking at. Another thing that complicates recursor logic and could emit negatives is the query minimisation, qname-minimisation, that creates queries for upper labels. It can also be turned off, by setting it to no.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants