perf: gRPC caching by x0152 · Pull Request #542 · gonka-ai/gonka

x0152 · 2026-01-10T12:47:53Z

Add grpc query caching with per block invalidation. Even with ~6s block time, it reduces chain node load by deduplicating queries within a block.

Cache overhead is ~200ns vs ~3ms for gRPC calls.

x0152 · 2026-01-14T17:59:28Z

ttl approach seams bad to me, and cache should be on block basis. When we process new block, we should clear the cache.

When we are using ttl approach it is possible that cache will return same value when we are in a next block. This can introduce a lot of possible errors and shouldn't be even tryed.

while per-block cache is safe and is what we are really want from cache, as data is not changing before the next block.

You're right
Was focused on dashboard API latency. Switched to per block invalidation.

Removed http caching for now - that's a separate topic
Thanks!

patimen

I'm really not sure about this PR.
There is some interaction between clients and caches here that could be a big problem. I called one out, I fear there may be more.
Generally speaking, we would want a cache as a gRPC Interceptor, so the cache is tied to a long lived connection vs a global cache that could come up and bite us some time, say, if we have multiple connections.
I do like the idea, and I can see how this could be very useful. But I think we need to maybe step back a bit and think harder about how to do this.

patimen · 2026-02-05T04:33:26Z

+	return &CachingConn{inner: inner}
+}
+
+func (c *CachingConn) Invoke(ctx context.Context, method string, args any, reply any, opts ...grpc.CallOption) error {


Shouldn't the grpc.CallOptions be passed on to the CachedInvoke?

patimen · 2026-02-05T04:54:59Z

+		return "", err
+	}
+	hash := sha256.Sum256(data)
+	return method + ":" + hex.EncodeToString(hash[:16]), nil


Why the truncation?

patimen · 2026-02-05T05:54:41Z

 	}

+	// Clear cache only when synced to avoid thrashing during catch-up
+	cosmosclient.ClearCache()


This is clearing the cache AFTER we have tried to get the networkInfo above... meaning that it will be info from the PREVIOUS block, not the new one.

Specifically, take a look at line 170 (queryNetworkInfo)

patimen · 2026-02-05T06:02:01Z

/run-integration

x0152 · 2026-02-09T21:27:08Z

@patimen Thanks for the review!

The safest cache idea is to make it height‑scoped and explicit: pass the height in context, then cache only for that height. That way every request clearly says which block it belongs to, and the developer controls it. Example:

ctx := context.WithValue(ctx, HeightKey, height) // or attach height to request metadata
resp, err := queryClient.Inference(ctx, req)     // cache uses ctx height

But I want to ask about the current solution - I rewrote it to a simpler and less buggy version that is easier to read, but it still uses a global height‑pinned cache. If we talk to a single node and single backend URL, what risks do you still see with a shared cache? I only see risks if we ever talk to different nodes or have a load balancer/proxy between the dapi and the node, which is not the case for us right now

What do you think?

x0152 · 2026-02-13T15:24:23Z

@akup Thanks, this is a good point.

I agree that we should use x-cosmos-block-height from the response as the source of truth for cache writes, and I will add this to current implementation. In load balanced or multi node setups, some edge cases can still happen

P.S.
Also, I suggested Grafana support in PR #721. If we add cache hit/miss and cache-related error metrics, we can clearly see whether this gives real benefit in production

* validation scripts and readmes Made-with: Cursor * update benchmark code * update inference to multiling datasets * dont keep it in main * logprobs processed * update * rmeove logprobs mode --------- Co-authored-by: Gleb Morgachev <morgachev.g@gmail.com>

Co-authored-by: Tamaz Gadaev <gadaev.tamaz@gmail.com>

* refactor(subnet): simplify host, user, and state machine internals Round 1: remove user-side workarounds (expectedRoots, InferenceResult, SendPrepared, NextDiff, two-pass filtering). Round 2: guard warm key snapshot behind storage check, collapse executeJob into subnet.ExecuteRequest, simplify challengeReceiptLocked dedup pattern, remove redundant addressInGroup field from state machine. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix(subnet): use ModifyRequestBody in validation to match execution params Validation re-execution used the stored original prompt without applying ModifyRequestBody, so logprobs, seed, and max_tokens were missing. This caused empty validation logits (lengthValidation=0) for every inference. * fix: avoid re-submitting ConirmStart * idempotent state machine * Long SSE? * response writer * WIP * Fixes * refactor(subnet): replace proxy retry loops with send-once-retry-once Under load (50 concurrent inferences), the retry loop DDoSes hosts with 650+ retries over 65s. Replace with two attempts max: send, wait for deadline, retry once, then timeout. Adds catch-up diffs to timeout verification so verifiers know about the inference. Adds RWMutex to StateMachine to prevent concurrent map crashes. Fixes flaky GossipRecovery test that raced with async gossip propagation. * refactor(subnet): replace filterPendingTxs with state machine best-effort apply filterPendingTxs reimplemented state machine validation rules as a parallel filter layer. ApplyLocalBestEffort makes the state machine the single source of truth -- stale txs are skipped by applyTx itself, zero snapshot overhead since all handlers are check-first-mutate-last. Also extracts sendDiffRound/sendCatchUp from Finalize and collapses executeAsync into RunExecution. --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(subnet): map TokenPrice in ChainBridge.GetEscrow ChainBridge.GetEscrow was not mapping TokenPrice from the chain query response, causing the server to default to 0. SessionConfigWithPrice treated 0 as 1, diverging from the chain's actual TokenPrice and producing state root mismatches on the first MsgStartInference. * refactor(subnet): tidy diagnostic logging for state machine and inference handler State machine: keep NewStateMachine Info log (fires once) and state root mismatch Error log (fires on errors). Restore ApplyLocalBestEffort to Debug level with enriched fields, drop redundant root hex dump. Server: remove per-request Info logs added during debugging (sender, body, receipt, breadcrumbs). Keep Error logs on actual error paths. * fix(upgrade): set subnet escrow group size to 16 in v0.2.11 * fix(api): prevent blocking on external calls in subnet and validation - Use singleflight in HostManager.getOrCreate to avoid holding global lock during gRPC calls (stuck calls would block all session requests) - Add 30s timeout to payload retrieval HTTP client * fix(subnet): reject validation when claimed token counts exceed re-execution usage * fix(api): return false when event value index is out of bounds Previously, when an event in a batched transaction was missing an attribute, getEventValue would silently return the first event's value. This caused wrong metadata to be associated with subsequent records. Now it correctly returns false so callers can detect the missing attribute and fail safely.

Co-authored-by: Gleb Morgachev <morgachev.g@gmail.com>

* biunty-distribution * add bounty add one missing bounty for collateral slashing --------- Co-authored-by: Gleb Morgachev <morgachev.g@gmail.com>

…ats opt-in

x0152 · 2026-03-19T10:36:38Z

@x0152 Please clean the commits of this PR

Only 3 commits are actually about gRPC caching: 3e41e57 — perf: add LRU cache for gRPC queries (the core feature) 4764bbc — fix: height-pinned query cache 09f1a8a — fix: cache by response block height

There are a lot of already merged unrelated commits at this PR and it is hard to review

Done: rebased and cleaned

patimen · 2026-04-28T22:01:11Z

Can we move on this now? It's gone through a good number of iterations and is looking close, and could definitely still help perf.

tcharchian · 2026-04-28T22:58:57Z

@x0152 @akup hey, should this be included in v0.2.13?

x0152 · 2026-04-29T08:59:45Z

Yes, let's take this in v0.2.13

tcharchian · 2026-05-22T01:35:27Z

@x0152 are you ready to open this PR and rebase it for the next upgrade?

x0152 · 2026-05-28T19:03:47Z

#1272

x0152 changed the title ~~Perf/gRPC caching~~ perf/gRPC caching Jan 10, 2026

x0152 force-pushed the perf/grpc-rest-caching branch from d530e40 to 8b2615c Compare January 10, 2026 13:15

x0152 changed the title ~~perf/gRPC caching~~ perf: gRPC caching Jan 10, 2026

x0152 marked this pull request as draft January 14, 2026 15:02

x0152 force-pushed the perf/grpc-rest-caching branch 2 times, most recently from e5b5402 to 3e41e57 Compare January 14, 2026 17:49

x0152 marked this pull request as ready for review January 14, 2026 17:59

patimen added this to the v0.2.10 milestone Jan 29, 2026

patimen changed the base branch from main to upgrade-v0.2.10 February 3, 2026 23:36

patimen suggested changes Feb 5, 2026

View reviewed changes

patimen modified the milestones: v0.2.10, v0.2.11 Feb 5, 2026

x0152 marked this pull request as draft February 9, 2026 20:46

x0152 marked this pull request as ready for review February 9, 2026 21:27

tcharchian added this to Upgrade v0.2.10 Feb 9, 2026

github-project-automation Bot moved this to Todo in Upgrade v0.2.10 Feb 9, 2026

tcharchian added this to Upgrade v0.2.11 Feb 9, 2026

github-project-automation Bot moved this to Todo in Upgrade v0.2.11 Feb 9, 2026

tcharchian removed this from Upgrade v0.2.10 Feb 9, 2026

x0152 force-pushed the perf/grpc-rest-caching branch from c1a5e71 to 09f1a8a Compare February 13, 2026 17:35

gmorgachev deleted the branch gonka-ai:upgrade-v0.2.14 February 20, 2026 08:48

gmorgachev closed this Feb 20, 2026

github-project-automation Bot moved this from Todo to Done in Upgrade v0.2.11 Feb 20, 2026

gmorgachev reopened this Feb 20, 2026

tcharchian moved this from Done to Needs reviewer in Upgrade v0.2.11 Feb 21, 2026

x0152 mentioned this pull request Feb 21, 2026

[4/4] StartInference and FinishInference #783

Closed

tamazgadaev and others added 12 commits March 16, 2026 22:10

Add porosity and negativity check (gonka-ai#903)

a10292c

Co-authored-by: Tamaz Gadaev <gadaev.tamaz@gmail.com>

upgrade params (gonka-ai#916)

9929217

Only adjust for collateral if it's actually enabled (gonka-ai#918)

1041a67

Co-authored-by: Gleb Morgachev <morgachev.g@gmail.com>

add bounty distribution (gonka-ai#919)

1f130d9

* biunty-distribution * add bounty add one missing bounty for collateral slashing --------- Co-authored-by: Gleb Morgachev <morgachev.g@gmail.com>

README

74f5ff8

feat(upgrade): bump images to 0.2.11 and require explicit postgres st…

54c09b3

…ats opt-in

perf: add LRU cache for gRPC queries

358fe00

fix: height-pinned query cache

f75381d

fix: cache by response block height

7a73c9c

x0152 force-pushed the perf/grpc-rest-caching branch from 09f1a8a to 7a73c9c Compare March 19, 2026 10:35

tcharchian removed this from Upgrade v0.2.12 Mar 21, 2026

tcharchian removed this from the v0.2.12 milestone Mar 21, 2026

tcharchian added this to Triage Apr 24, 2026

github-project-automation Bot moved this to New in Triage Apr 24, 2026

tcharchian moved this from New to Accepted in Triage Apr 28, 2026

tcharchian added this to the v0.2.13 milestone Apr 28, 2026

x0152 marked this pull request as draft May 3, 2026 12:22

x0152 changed the base branch from upgrade-v0.2.11 to upgrade-v0.2.14 May 28, 2026 12:52

x0152 closed this May 28, 2026

github-project-automation Bot moved this from Accepted to Archived / Closed in Triage May 28, 2026

x0152 mentioned this pull request May 28, 2026

perf: gRPC caching #1272

Open

Conversation

x0152 commented Jan 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

x0152 commented Jan 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

patimen left a comment

Choose a reason for hiding this comment

Uh oh!

patimen Feb 5, 2026

Choose a reason for hiding this comment

Uh oh!

patimen Feb 5, 2026

Choose a reason for hiding this comment

Uh oh!

patimen Feb 5, 2026

Choose a reason for hiding this comment

Uh oh!

patimen Feb 5, 2026

Choose a reason for hiding this comment

Uh oh!

patimen commented Feb 5, 2026

Uh oh!

x0152 commented Feb 9, 2026

Uh oh!

x0152 commented Feb 13, 2026

Uh oh!

x0152 commented Mar 19, 2026

Uh oh!

patimen commented Apr 28, 2026

Uh oh!

tcharchian commented Apr 28, 2026

Uh oh!

x0152 commented Apr 29, 2026

Uh oh!

tcharchian commented May 22, 2026

Uh oh!

x0152 commented May 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

13 participants

x0152 commented Jan 10, 2026 •

edited

Loading

x0152 commented Jan 14, 2026 •

edited

Loading