perf(node-handler): use spec.nodeName field indexer instead of cluste…#432
Open
DioCrafts wants to merge 2 commits intokite-org:mainfrom
Open
perf(node-handler): use spec.nodeName field indexer instead of cluste…#432DioCrafts wants to merge 2 commits intokite-org:mainfrom
DioCrafts wants to merge 2 commits intokite-org:mainfrom
Conversation
…r-wide pod list
- Replace cluster-wide pod List + manual grouping with per-node indexed queries
using client.MatchingFields{"spec.nodeName": node.Name}
- Remove samber/lo dependency (dead code after optimization)
- Pre-size maps to avoid rehashing
- Add comprehensive test suite (6 tests) for NodeHandler.List()
This reduces memory from O(all_pods) to O(max_pods_per_node) per request
and leverages the existing field indexer registered in pkg/kube/client.go.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: a912a69e3a
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
…rue) Address Codex review P2 feedback: when DISABLE_CACHE=true the client hits the API server directly, so per-node MatchingFields queries would cause O(N) API calls instead of 1. Changes: - Add CacheEnabled field to K8sClient, set based on DISABLE_CACHE env var - NodeHandler.List() now branches: - CacheEnabled=true → per-node indexed queries (O(1) cache lookups) - CacheEnabled=false → single cluster-wide pod list + group in Go (1 API call) - Extract sumPodResources() helper to avoid duplicating aggregation logic - Add 4 new tests for the uncached fallback path (10 total, all PASS)
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
perf(node-handler): Replace cluster-wide pod scan with indexed per-node queries
Problem
The
NodeHandler.List()endpoint — called every time the Nodes view loads in the dashboard — was fetching every single pod in the entire cluster just to count how many pods run on each node and sum their resource requests.In a medium-sized cluster (50 nodes, 5,000 pods), this meant:
The irony? The codebase already had a
spec.nodeNamefield indexer registered inpkg/kube/client.go— an inverted index that mapsnodeName → [pod1, pod2, ...]inside the controller-runtime cache. It just wasn't being used.Solution
Replace the cluster-wide pod list + in-memory grouping with per-node indexed queries against the local informer cache:
How the field indexer works
The informer cache is already in memory (it's shared across all handlers). The field index is essentially a
map[string][]client.Object— looking up pods for a specific node is a hash table lookup, not a scan.Performance Impact
Real-world estimates
For a cluster with 100 nodes and 10,000 pods (100 pods/node avg):
[]Podof 10,000 items → ~100MB, then iterates all 10,000 to group by node, thenlo.KeyByover metrics → 3 full passes over all dataEstimated latency improvement: 5–20x faster on medium/large clusters.
Changes Made
1.
pkg/handlers/resources/node_handler.goImports cleaned up:
github.com/samber/lo— was only used forlo.KeyBy()in this filesigs.k8s.io/controller-runtime/pkg/client— forclient.MatchingFieldsList()method rewritten:List(ctx, &pods)for ALL podsforloop grouping pods bypod.Spec.NodeNamelo.KeyBy()call for nodeMetrics map constructionclient.MatchingFields{"spec.nodeName": node.Name}make(map[...], len(nodes.Items))to eliminate rehashing2.
pkg/handlers/resources/node_handler_test.go(NEW)Comprehensive test suite with 6 tests, all using
controller-runtime/pkg/client/fakewith the same field indexer registered in production:PodAssignmentByFieldIndexEmptyClusterNodesWithoutPodsMultipleContainersPerPodSortOrderCrossNamespacePodsWhy This Is Safe
No new infrastructure — the
spec.nodeNamefield indexer was already registered inpkg/kube/client.go(lines 97–105). We're just using what was already there.Same data source — both the old and new code read from the same controller-runtime informer cache. No new API calls to the Kubernetes API server.
Behavioral equivalence — the response JSON structure is identical. The same fields are populated with the same values. Front-end requires zero changes.
Fully tested — 6 tests cover normal operation, edge cases (empty cluster, no pods, multi-container), cross-namespace aggregation, and sort order.
Dependency reduction — removes usage of
samber/lofrom this file, reducing the external dependency surface.Checklist
go build ./...passesgo test ./pkg/handlers/resources/ -run TestNodeHandler— 6/6 PASScontroller-runtime/pkg/client)samber/lo)