Skip to content

perf: gRPC caching#1272

Open
x0152 wants to merge 10 commits into
gonka-ai:upgrade-v0.2.14from
x0152:perf/grpc-rest-caching-v2.14
Open

perf: gRPC caching#1272
x0152 wants to merge 10 commits into
gonka-ai:upgrade-v0.2.14from
x0152:perf/grpc-rest-caching-v2.14

Conversation

@x0152

@x0152 x0152 commented May 28, 2026

Copy link
Copy Markdown
Collaborator

Supersedes #542

Summary

Implemented a gRPC query cache in DAPI:

  • Cache keying by (block_height, method, request_hash)
  • Block-pinned reads via x-cosmos-block-height for consistent per-block queries
  • Height hint fallback for unpinned requests
  • Explicit cache bypass for EpochInfo to avoid stale network state
  • Query cache stats support in the client/admin layer

Endpoints

  • GET /admin/v1/cache/stats
  • POST /admin/v1/cache/stats/reset

Load test

Tested under load (localtestnet , ~5s blocks). The cache stores results per block, so many identical requests in the same block are served from one backend call (extra ones are deduplicated).

1000 concurrent requests per endpoint (bypassing the proxy, ~5s blocks, cache memory limit = 1 GiB):

Endpoint gRPC calls cache hits backend calls cache size wall (ON) wall (OFF)
/v1/governance/models 1000 999 1 ~3.5 KB 45 ms 223 ms
/v1/models 1000 999 1 ~3.6 KB 37 ms 196 ms
/v1/governance/pricing 3000 2999 1 ~3.6 KB 37 ms 536 ms
/v1/pricing 1000 1000 0 ~3.6 KB 35 ms 223 ms
/v1/participants 1000 999 1 ~3.8 KB 34 ms 175 ms

The latency gain is small on the local testnet (tiny state, backend ~150 ms). On prod (bigger state, real load) it will be more noticeable.

Out of scope (follow-up PR)

/v1/status and ABCI queries are not cached - they use CometBFT RPC, not the gRPC layer. Caching them needs a separate cache, so it's left for a follow-up pr

@x0152 x0152 mentioned this pull request May 28, 2026
@patimen patimen added this to the v0.2.14 milestone May 29, 2026
@x0152 x0152 marked this pull request as ready for review June 4, 2026 16:35
@x0152 x0152 requested review from a-kuprin and patimen June 4, 2026 16:35
@patimen

patimen commented Jun 10, 2026

Copy link
Copy Markdown
Contributor

/run-integration

@a-kuprin

a-kuprin commented Jun 10, 2026

Copy link
Copy Markdown
Collaborator

Performance

Single global mutex on the hot read path.
Every cache hit takes the full sync.Mutex because lookup mutates the LRU list:

func (c *QueryCache) lookup(height int64, key string) ([]byte, bool) {
	c.mu.Lock()
	defer c.mu.Unlock()
	...
	c.lru.MoveToFront(el)   // write under lock on every read
	return v, true
}

The whole point of this cache is to absorb bursts (the PR tests 1000 concurrent requests/endpoint), but every hit serializes on one lock — including the MoveToFront pointer surgery and a map lookup on lruIndex. An RWMutex won't help because reads write the list.

LRU recency is mostly wasted work

With keepLast = 3 heights and 200k-entry / 1 GiB limits, real-world eviction is almost always height pruning, not LRU. So the per-read MoveToFront cost buys you almost nothing in the common case — it's pure contention with no payoff. Consider dropping read-side reordering at all.

@x0152

x0152 commented Jun 10, 2026

Copy link
Copy Markdown
Collaborator Author

Performance

Single global mutex on the hot read path. Every cache hit takes the full sync.Mutex because lookup mutates the LRU list:

func (c *QueryCache) lookup(height int64, key string) ([]byte, bool) {
	c.mu.Lock()
	defer c.mu.Unlock()
	...
	c.lru.MoveToFront(el)   // write under lock on every read
	return v, true
}

The whole point of this cache is to absorb bursts (the PR tests 1000 concurrent requests/endpoint), but every hit serializes on one lock — including the MoveToFront pointer surgery and a map lookup on lruIndex. An RWMutex won't help because reads write the list.

LRU recency is mostly wasted work

With keepLast = 3 heights and 200k-entry / 1 GiB limits, real-world eviction is almost always height pruning, not LRU. So the per-read MoveToFront cost buys you almost nothing in the common case — it's pure contention with no payoff. Consider dropping read-side reordering at all.

Good catch, missed it. Updated the table

Also fixed stale reads in the cache

@patimen

patimen commented Jun 10, 2026

Copy link
Copy Markdown
Contributor

@x0152 Did you consider using golang-lru or some other cache library here instead of implementing the cache logic by hand? The block-height logic is specific to Gonka, but the LRU/eviction/concurrency parts look like generic cache machinery.

@x0152

x0152 commented Jun 11, 2026

Copy link
Copy Markdown
Collaborator Author

@x0152 Did you consider using golang-lru or some other cache library here instead of implementing the cache logic by hand? The block-height logic is specific to Gonka, but the LRU/eviction/concurrency parts look like generic cache machinery.

Yes, agree - rewrote the generic cache machinery to use otter/v2 (better fit here then golang-lru).

ran the benchmark and got results a bit faster than the hand-written version

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants