Skip to content

perf(search): parallelize resource searches with errgroup#433

Open
DioCrafts wants to merge 4 commits intokite-org:mainfrom
DioCrafts:perf/search-parallel-remove-bg-refresh
Open

perf(search): parallelize resource searches with errgroup#433
DioCrafts wants to merge 4 commits intokite-org:mainfrom
DioCrafts:perf/search-parallel-remove-bg-refresh

Conversation

@DioCrafts
Copy link
Contributor

perf(search): Parallelize resource searches with errgroup & remove background refresh

Search downloads full resource lists sequentially

Problem

SearchHandler.Search() iterated over up to 9 resource types sequentially. Each call to a search function triggered a full K8sClient.List() + name filtering round-trip. With 9 resource types, the worst-case latency was O(N × avg_latency) — roughly 9× the cost of a single resource search.

Additionally, GlobalSearch() on a cache hit launched a fire-and-forget background goroutine (go func() { h.Search(copiedCtx, ...) }()) to refresh the cache. This caused:

  • Unbounded goroutine spawning: Every cached search hit spawned a new goroutine
  • Wasted API server bandwidth: 9 sequential List calls per background refresh
  • Race potential: gin.Context.Copy() used to share state with background goroutine
  • Negligible UX benefit: With a 10-minute TTL, stale-while-revalidate provides minimal freshness improvement

Solution 1 — Parallel search with errgroup

Replaced the sequential for name, searchFunc := range searchFuncs loop with errgroup.WithContext() parallel execution:

resultSlices := make([][]common.SearchResult, len(entries))
g, _ := errgroup.WithContext(context.Background())

for i, entry := range entries {
    g.Go(func() error {
        results, err := entry.fn(c, q, int64(limit))
        if err != nil {
            log.Printf("search: resource %q failed: %v", entry.name, err)
            return nil // non-fatal
        }
        resultSlices[i] = results
        return nil
    })
}
_ = g.Wait()

Key design decisions:

  • Each goroutine writes to its own pre-allocated index in resultSlices — no mutex needed
  • Errors are logged and skipped (non-fatal) — one failing resource type doesn't break the entire search
  • Results are merged, sorted, and truncated after all goroutines complete
  • gin.Context is read-safe for concurrent access (only MustGet("cluster") is called)

Latency reduction: O(N × avg) → O(max) = ~9× improvement in worst case

Solution 2 — Remove background refresh goroutine

Removed the stale-while-revalidate pattern from GlobalSearch():

// BEFORE (removed):
copiedCtx := c.Copy()
go func() {
    _, _ = h.Search(copiedCtx, query, limit)
}()

// AFTER: just serve cached results
c.JSON(http.StatusOK, response)
return

Cache entries now expire naturally after the 10-minute TTL. The next request after expiry triggers a fresh (now parallel!) search.

Dead code removed

  • sync.Mutex import and variable (not needed with index-based writes)
  • gin.Context.Copy() call (no longer needed without background goroutine)

Tests added (4 new, 8 total — all PASS)

Test Validates
TestSearchParallelExecution Goroutines run concurrently (timing assertion + atomic max-concurrency counter)
TestSearchPartialFailure One failing resource type doesn't break others
TestGlobalSearchCacheDoesNotTriggerBackgroundRefresh Cache hit does NOT spawn background goroutine (atomic call counter)
TestSearchEmptyResourceFuncs Graceful handling of zero searchable resource types

Files changed

File Changes
pkg/handlers/search_handler.go Parallel errgroup search, removed bg refresh, dead code cleanup
pkg/handlers/search_handler_test.go 4 new tests
go.mod golang.org/x/sync promoted from indirect to direct dependency

Performance impact

Metric Before After
Search latency (9 types) ~9 × avg API call ~1 × max API call
Background goroutines per cached hit 1 (unbounded) 0
API calls per cached hit 9 (sequential bg refresh) 0
Memory per cached search +1 gin.Context.Copy() 0

@zxh326 zxh326 enabled auto-merge (squash) March 22, 2026 08:36
@zxh326 zxh326 changed the title perf(search): parallelize resource searches with errgroup & remove ba… perf(search): parallelize resource searches with errgroup Mar 22, 2026
auto-merge was automatically disabled March 22, 2026 08:47

Head branch was pushed to by a user without write access

@DioCrafts DioCrafts force-pushed the perf/search-parallel-remove-bg-refresh branch from 3f17b35 to 521684d Compare March 22, 2026 08:47
@DioCrafts
Copy link
Contributor Author

CI fail already fixed @zxh326

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 521684d2d0

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

…ckground refresh

Solution A: Replace sequential SearchFuncs iteration with errgroup-based
parallel execution. Each resource type (pods, services, deployments, etc.)
is now searched concurrently, reducing latency from O(N * avg_latency) to
O(max_latency) where N is the number of searchable resource types.

- Each goroutine writes to its own pre-allocated index in resultSlices
  (no mutex contention needed)
- Failing resource types are logged and skipped without aborting the search
- Results are merged, sorted, and truncated after all goroutines complete

Solution E: Remove background goroutine that refreshed cache on hits.
The stale-while-revalidate pattern caused unbounded fire-and-forget
goroutines, wasted API server bandwidth, and provided negligible UX
benefit given the 10-minute cache TTL. Cache entries now expire naturally.

Dead code removed:
- Removed unused sync.Mutex import and variable
- Removed gin.Context.Copy() call (no longer needed without bg goroutine)

New tests:
- TestSearchParallelExecution: proves concurrent execution (timing + atomic counter)
- TestSearchPartialFailure: one failing resource doesn't break others
- TestGlobalSearchCacheDoesNotTriggerBackgroundRefresh: validates Solution E
- TestSearchEmptyResourceFuncs: graceful handling of zero search funcs

All 8/8 tests pass.
@DioCrafts DioCrafts force-pushed the perf/search-parallel-remove-bg-refresh branch from 521684d to 86da2bb Compare March 22, 2026 08:53
Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 86da2bb8c1

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

… results)

- Replace `err = nil` in recover block with `hadPanic.Store(true)` atomic flag
- Skip `cache.Add()` when any goroutine panicked, preventing 10-min stale-while-broken
- Partial results still returned to caller (non-panicked resources remain useful)
- Add TestSearchPanicDoesNotCacheResults: validates panic skips cache + retry works

Addresses review: chatgpt-codex-connector P2 on lines 84-87
Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 17d1ff3801

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

- Rename hadPanic → hadFailure (atomic.Bool) — set on both panic AND error
- entry.fn returning an error now also prevents cache write
- Transient API failures no longer get cached as valid 200 OK for 10 min
- TestSearchPartialFailure extended: verifies cache is skipped on error

Addresses review: chatgpt-codex-connector P2 on lines 92-95
@DioCrafts DioCrafts requested a review from zxh326 March 22, 2026 09:38
@zxh326 zxh326 enabled auto-merge (squash) March 22, 2026 09:45
auto-merge was automatically disabled March 22, 2026 10:34

Head branch was pushed to by a user without write access

@DioCrafts
Copy link
Contributor Author

fix lint on test file

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants