feat: implement node selector for nodeport svc#11202
feat: implement node selector for nodeport svc#11202tanujd11 wants to merge 2 commits intoprojectcalico:masterfrom
Conversation
Signed-off-by: tanujd11 <dwiveditanuj41@gmail.com>
There was a problem hiding this comment.
Pull Request Overview
This PR implements selective NodePort service exposure based on node labels, enabling multi-tenant clusters where tenants have dedicated nodes. Services can now use the projectcalico.org/nodeSelector annotation to control which nodes expose their NodePort services.
Key Changes:
- Adds node selector evaluation logic that checks node labels against service annotations before programming NodePort entries
- Implements node label tracking and update callbacks in the BPF dataplane components
- Extends the BPF proxy syncer to support node label-based filtering of NodePort services
Reviewed Changes
Copilot reviewed 9 out of 9 changed files in this pull request and generated 1 comment.
Show a summary per file
| File | Description |
|---|---|
| felix/dataplane/linux/int_dataplane.go | Registers node labels update callback with BPF route manager |
| felix/dataplane/linux/bpf_route_mgr.go | Adds node labels tracking and callback mechanism for label updates |
| felix/bpf/proxy/syncer_test.go | Adds comprehensive test coverage for node selector functionality |
| felix/bpf/proxy/syncer_bench_test.go | Updates benchmark tests to pass nil for new nodeLabels parameter |
| felix/bpf/proxy/syncer.go | Implements core node selector matching logic and extends NewSyncer signature |
| felix/bpf/proxy/proxy_bench_test.go | Updates benchmark to pass nil for new nodeLabels parameter |
| felix/bpf/proxy/proxy.go | Adds NodeSelector annotation constant and service annotation interface |
| felix/bpf/proxy/lb_src_range_test.go | Updates tests to pass nil for new nodeLabels parameter |
| felix/bpf/proxy/kube-proxy.go | Implements node label fetching and resync logic on label changes |
Signed-off-by: tanujd11 <dwiveditanuj41@gmail.com>
caseydavenport
left a comment
There was a problem hiding this comment.
Just doing a first pass here - I think the main thing is that we need to get rid of the explicit Get() calls to the API server and instead rely on syncer data.
| ctx, cancel := context.WithTimeout(context.Background(), 10*time.Second) | ||
| defer cancel() | ||
|
|
||
| node, err := kp.k8s.CoreV1().Nodes().Get(ctx, kp.hostname, metav1.GetOptions{}) |
There was a problem hiding this comment.
I think we shouldn't be performing any Get requests from within the Felix code. Felix runs as a DaemonSet on every node in the cluster, and for very large clusters this can result in problems with scale.
We will need to feed this via the Syncer which is a Watch API fed by Typha for scale purposes. I believe Felix is already receiving node labels over that API (or at least has the ability to without much fuss)
| } | ||
|
|
||
| // NewK8sServicePortWithSelector creates a new k8s ServicePort with a node selector | ||
| func NewK8sServicePortWithSelector(clusterIP net.IP, port int, proto v1.Protocol, |
There was a problem hiding this comment.
This looks like it is only used in test code, so should probably live in the _test.go file
| // Trigger a resync with the current host IPs | ||
| // This will recreate the syncer with updated node labels | ||
| if len(hostIPs) > 0 { | ||
| kp.OnHostIPsUpdate(hostIPs) |
There was a problem hiding this comment.
Looks like we are already getting node label updates streamed to us, so I think instead of calling OnHostIPsUpdate here we could introduce a new method for updating the labels that triggers a re-evaluation of Service programming, getting rid of the need for the Get() call mentioned above, right?
There was a problem hiding this comment.
yes, I was going to say that we should perhaps use the same/similar approach as when updating the host IPs. Both is expected to be rather infrequent event.
There was a problem hiding this comment.
Follow the OnHostIPsUpdate example. To avoid possibly restarting kube-proxy 2x, it might be the best to move https://github.com/projectcalico/calico/blob/master/felix/dataplane/linux/bpf_route_mgr.go#L590 to https://github.com/projectcalico/calico/blob/master/felix/dataplane/linux/bpf_route_mgr.go#L242 if there was any change and send labels and hostIPs as a single update.
| kp.lock.Unlock() | ||
|
|
||
| // Check if labels actually changed | ||
| labelsChanged := len(oldLabels) != len(labels) |
There was a problem hiding this comment.
You should be able to use maps.Equal for this to simplify a bit
tomastigera
left a comment
There was a problem hiding this comment.
Note that this would only exclude external traffic from being handled by the nodeport. It would not exclude a pod from reaching the nodeport via the excluded host's IP. But that should not really matter as there is no forwarding from that IP's node to the node with the backend. Nodeport is always resolved straight to the backend pod IP on the client pod's node.
| // Trigger a resync with the current host IPs | ||
| // This will recreate the syncer with updated node labels | ||
| if len(hostIPs) > 0 { | ||
| kp.OnHostIPsUpdate(hostIPs) |
There was a problem hiding this comment.
yes, I was going to say that we should perhaps use the same/similar approach as when updating the host IPs. Both is expected to be rather infrequent event.
| defer kp.lock.Unlock() | ||
|
|
||
| kp.lastHostIPs = hostIPs | ||
| kp.nodeLabels = kp.fetchNodeLabels() |
There was a problem hiding this comment.
you do not need to fetch the labels, you already got them in OnNodeLabelsUpdate()
There was a problem hiding this comment.
yes, agreed. I am updating it. I got a bit confused there. Thanks.
| // Trigger a resync with the current host IPs | ||
| // This will recreate the syncer with updated node labels | ||
| if len(hostIPs) > 0 { | ||
| kp.OnHostIPsUpdate(hostIPs) |
There was a problem hiding this comment.
Follow the OnHostIPsUpdate example. To avoid possibly restarting kube-proxy 2x, it might be the best to move https://github.com/projectcalico/calico/blob/master/felix/dataplane/linux/bpf_route_mgr.go#L590 to https://github.com/projectcalico/calico/blob/master/felix/dataplane/linux/bpf_route_mgr.go#L242 if there was any change and send labels and hostIPs as a single update.
|
Also note that bumping k8s libs in current master kubernetes/kubernetes#133059 we no need to incorporate node manager kubernetes/kubernetes#133059 so you may get all you need in a much simpler way if you hold off a little bit. |
|
Turns out that the node selector is not that helpful after all 😞 so we would need to send the labels as I proposed. We would like to decouple the callback and the proxy execution so that reprogramming kube-proxy would not block the rest of felix from running. |
|
This PR is stale because it has been open for 60 days with no activity. |
|
@tanujd11 just to follow up @aaaaaaaalex is now working on some of the base work for this - it did not get completely off our radar. |
|
@tanujd11 what is the exact scenario? Excluding exposing a port on a certain node is sort of like placing a deny policy that would block access to the port on selected nodes. Does you LB already understand which nodes not to use to target the service? I guess you do not want to run the backend pods on these nodes and you have a way how to do that. I guess your LB would only target the nodes with the pods. So is this change just to not implemented the nodeport on these nodes to make things clean? Or to leave the nodeport for the use of other services on these nodes? |
We even have this use-case in a policy tutorial in the docs: https://docs.tigera.io/calico/latest/network-policy/services/kubernetes-node-ports#allow-ingress-traffic-to-specific-node-ports |
Description
Adds a
projectcalico.org/nodeSelectorannotation to selectively expose NodePort services on specific nodes based on node labels. Designed for multi-tenant clusters where tenants have dedicated nodes.Related issues/PRs
fixes #5495
Todos
Release Note
Reminder for the reviewer
Make sure that this PR has the correct labels and milestone set.
Every PR needs one
docs-*label.docs-pr-required: This change requires a change to the documentation that has not been completed yet.docs-completed: This change has all necessary documentation completed.docs-not-required: This change has no user-facing impact and requires no docs.Every PR needs one
release-note-*label.release-note-required: This PR has user-facing changes. Most PRs should have this label.release-note-not-required: This PR has no user-facing changes.Other optional labels:
cherry-pick-candidate: This PR should be cherry-picked to an earlier release. For bug fixes only.needs-operator-pr: This PR is related to install and requires a corresponding change to the operator.