CoreDNS daemonset #6247

dudicoco · 2023-07-28T08:06:01Z

Hi,

I see that you are deploying coredns as a daemonset: https://github.com/zalando-incubator/kubernetes-on-aws/blob/25e3ac810c8cc80e0a4b91fe6d0d732ed9aa2dd7/cluster/manifests/coredns-local/daemonset-coredns.yaml

There are ongoing discussions on the benefits of using a coredns deployment + nodelocal dns architecture vs using a coredns daemonset:
kubernetes/dns#594
coredns/helm#86 (comment)

I would like to hear about your experience with using coredns as a daemonset and if it is something you would recommend over a coredns deployment + nodelocal dns architecture.

Thanks

szuecs · 2023-08-01T20:26:55Z

We use coredns as daemonset and a dnsmasq cache in front of the coredns container. We measured that it is more reliable to run with this and a local conntrack bypass on the node. It's a very important service for us so adding a little resource overhead is fine compared with an incident. The fix in coredns was reverted (I reported it and I reported also the issue in Go issue tracker).

So yes to some it up:
I recommend running dnsmasq + coredns + conntrack bypass such that pods on node A won't request dns from other nodes than the local one (besides origin requests from coredns).
It's much better than nodelocaldns which uses coredns as library as far as I remember, which will easily oom on some higher amounts of nodejs pods that have a retry flood (which causes a lot of dns traffic).

dudicoco · 2023-08-02T10:32:44Z

Thanks for the info @szuecs!

Regarding the concern raised by some people that running coredns as a daemonset could cause a high load on the API server due to the different watches - is this something that you've experienced? (if you are able to share your scale of nodes + service endpoints that would be helpful to understand the scale we're talking about)

Another possible issue that I raised as a concern - if the dnsmasq container starts before coredns you could get dns resolution errors, is this something you've experienced?

mikkeloscar · 2023-08-02T11:30:05Z

Regarding the concern raised by some people that running coredns as a daemonset could cause a high load on the API server due to the different watches - is this something that you've experienced? (if you are able to share your scale of nodes + service endpoints that would be helpful to understand the scale we're talking about)

This setup is used for peak loads of up to 1000-1500 nodes in the biggest clusters. More normal load is up to 500 nodes and with about 6-7000 pods.

Another possible issue that I raised as a concern - if the dnsmasq container starts before coredns you could get dns resolution errors, is this something you've experienced?

We have a special node startup setup where we start nodes with a taint such that pods won't be scheduled and only when all daemonset pods are running and ready is a node marked ready by removing this taint. This way we ensure that things are up and running before workloads depend on it. Ofc. it doesn't help if dnsmasq or coredns crashes in the middle, but that is not an issue we have experienced.

We have a lot of details in internal design documents which we unfortunately can't share in the current state because of many internal references, and I don't have time at the moment to clean it up to be shareable. Some things to highlight:

We compared CoreDNS vs. dnsmasq as cache and found that dnsmasq can sustain much higher load without blowing up the resource usage. IIRC there was a similar test for nodelocalDNS where dnsmasq was also shown to be better.
- We also found that unbound performed even better than dnsmasq, but we saw occasional lockups at runtime so we reverted to dnsmasq and never followed up if this was fixed in newer versions :/
We run kubelet with the --cluster-dns flag set to the IP of the local node. This way all pods uses the local node (CoreDNS daemonset) as DNS server

dudicoco · 2023-08-02T13:19:24Z

Thanks for the detailed information @mikkeloscar.

Regarding the load on the API server - how many resources and instances are you dedicating to the API server?
Did you notice any throttling/rate limiting errors within the API server's logs?

szuecs · 2023-08-11T21:55:30Z

@dudicoco we use dedicated node pools for control plane nodes and a separate etcd cluster. Normally run 2 control plane nodes (during update 3), and a 5 nodes etcd (m5.large sometimes we have to increase it). control plane nodes depend on cluster but one of the big clusters has c6g.2xlarge and one of the heavy postgres (created by postgres-operator) clusters (~250 in one cluster) has c6g.16xlarge control plane nodes.

szuecs · 2024-01-23T18:59:27Z

I guess we can close the issue

dudicoco mentioned this issue Sep 19, 2023

Using coredns daemonset instead of nodelocal dns kubernetes/dns#594

Closed

szuecs closed this as completed Jan 23, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CoreDNS daemonset #6247

CoreDNS daemonset #6247

dudicoco commented Jul 28, 2023

szuecs commented Aug 1, 2023 •

edited

Loading

dudicoco commented Aug 2, 2023

mikkeloscar commented Aug 2, 2023

dudicoco commented Aug 2, 2023

szuecs commented Aug 11, 2023

szuecs commented Jan 23, 2024

CoreDNS daemonset #6247

CoreDNS daemonset #6247

Comments

dudicoco commented Jul 28, 2023

szuecs commented Aug 1, 2023 • edited Loading

dudicoco commented Aug 2, 2023

mikkeloscar commented Aug 2, 2023

dudicoco commented Aug 2, 2023

szuecs commented Aug 11, 2023

szuecs commented Jan 23, 2024

szuecs commented Aug 1, 2023 •

edited

Loading