Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CoreDNS daemonset #6247

Closed
dudicoco opened this issue Jul 28, 2023 · 6 comments
Closed

CoreDNS daemonset #6247

dudicoco opened this issue Jul 28, 2023 · 6 comments

Comments

@dudicoco
Copy link

Hi,

I see that you are deploying coredns as a daemonset: https://github.com/zalando-incubator/kubernetes-on-aws/blob/25e3ac810c8cc80e0a4b91fe6d0d732ed9aa2dd7/cluster/manifests/coredns-local/daemonset-coredns.yaml

There are ongoing discussions on the benefits of using a coredns deployment + nodelocal dns architecture vs using a coredns daemonset:
kubernetes/dns#594
coredns/helm#86 (comment)

I would like to hear about your experience with using coredns as a daemonset and if it is something you would recommend over a coredns deployment + nodelocal dns architecture.

Thanks

@szuecs
Copy link
Member

szuecs commented Aug 1, 2023

We use coredns as daemonset and a dnsmasq cache in front of the coredns container. We measured that it is more reliable to run with this and a local conntrack bypass on the node. It's a very important service for us so adding a little resource overhead is fine compared with an incident. The fix in coredns was reverted (I reported it and I reported also the issue in Go issue tracker).

So yes to some it up:
I recommend running dnsmasq + coredns + conntrack bypass such that pods on node A won't request dns from other nodes than the local one (besides origin requests from coredns).
It's much better than nodelocaldns which uses coredns as library as far as I remember, which will easily oom on some higher amounts of nodejs pods that have a retry flood (which causes a lot of dns traffic).

@dudicoco
Copy link
Author

dudicoco commented Aug 2, 2023

Thanks for the info @szuecs!

Regarding the concern raised by some people that running coredns as a daemonset could cause a high load on the API server due to the different watches - is this something that you've experienced? (if you are able to share your scale of nodes + service endpoints that would be helpful to understand the scale we're talking about)

Another possible issue that I raised as a concern - if the dnsmasq container starts before coredns you could get dns resolution errors, is this something you've experienced?

@mikkeloscar
Copy link
Contributor

Regarding the concern raised by some people that running coredns as a daemonset could cause a high load on the API server due to the different watches - is this something that you've experienced? (if you are able to share your scale of nodes + service endpoints that would be helpful to understand the scale we're talking about)

This setup is used for peak loads of up to 1000-1500 nodes in the biggest clusters. More normal load is up to 500 nodes and with about 6-7000 pods.

Another possible issue that I raised as a concern - if the dnsmasq container starts before coredns you could get dns resolution errors, is this something you've experienced?

We have a special node startup setup where we start nodes with a taint such that pods won't be scheduled and only when all daemonset pods are running and ready is a node marked ready by removing this taint. This way we ensure that things are up and running before workloads depend on it. Ofc. it doesn't help if dnsmasq or coredns crashes in the middle, but that is not an issue we have experienced.

We have a lot of details in internal design documents which we unfortunately can't share in the current state because of many internal references, and I don't have time at the moment to clean it up to be shareable. Some things to highlight:

  • We compared CoreDNS vs. dnsmasq as cache and found that dnsmasq can sustain much higher load without blowing up the resource usage. IIRC there was a similar test for nodelocalDNS where dnsmasq was also shown to be better.
    • We also found that unbound performed even better than dnsmasq, but we saw occasional lockups at runtime so we reverted to dnsmasq and never followed up if this was fixed in newer versions :/
  • We run kubelet with the --cluster-dns flag set to the IP of the local node. This way all pods uses the local node (CoreDNS daemonset) as DNS server

@dudicoco
Copy link
Author

dudicoco commented Aug 2, 2023

Thanks for the detailed information @mikkeloscar.

Regarding the load on the API server - how many resources and instances are you dedicating to the API server?
Did you notice any throttling/rate limiting errors within the API server's logs?

@szuecs
Copy link
Member

szuecs commented Aug 11, 2023

@dudicoco we use dedicated node pools for control plane nodes and a separate etcd cluster. Normally run 2 control plane nodes (during update 3), and a 5 nodes etcd (m5.large sometimes we have to increase it). control plane nodes depend on cluster but one of the big clusters has c6g.2xlarge and one of the heavy postgres (created by postgres-operator) clusters (~250 in one cluster) has c6g.16xlarge control plane nodes.

@szuecs
Copy link
Member

szuecs commented Jan 23, 2024

I guess we can close the issue

@szuecs szuecs closed this as completed Jan 23, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants