Skip to content

perf: gRPC caching#542

Closed
x0152 wants to merge 35 commits into
gonka-ai:upgrade-v0.2.14from
x0152:perf/grpc-rest-caching
Closed

perf: gRPC caching#542
x0152 wants to merge 35 commits into
gonka-ai:upgrade-v0.2.14from
x0152:perf/grpc-rest-caching

Conversation

@x0152

@x0152 x0152 commented Jan 10, 2026

Copy link
Copy Markdown
Collaborator

Add grpc query caching with per block invalidation. Even with ~6s block time, it reduces chain node load by deduplicating queries within a block.

Cache overhead is ~200ns vs ~3ms for gRPC calls.

@x0152 x0152 changed the title Perf/gRPC caching perf/gRPC caching Jan 10, 2026
@x0152 x0152 force-pushed the perf/grpc-rest-caching branch from d530e40 to 8b2615c Compare January 10, 2026 13:15
@x0152 x0152 changed the title perf/gRPC caching perf: gRPC caching Jan 10, 2026
@x0152 x0152 marked this pull request as draft January 14, 2026 15:02
@x0152 x0152 force-pushed the perf/grpc-rest-caching branch 2 times, most recently from e5b5402 to 3e41e57 Compare January 14, 2026 17:49
@x0152

x0152 commented Jan 14, 2026

Copy link
Copy Markdown
Collaborator Author

ttl approach seams bad to me, and cache should be on block basis. When we process new block, we should clear the cache.

When we are using ttl approach it is possible that cache will return same value when we are in a next block. This can introduce a lot of possible errors and shouldn't be even tryed.

while per-block cache is safe and is what we are really want from cache, as data is not changing before the next block.

You're right
Was focused on dashboard API latency. Switched to per block invalidation.

Removed http caching for now - that's a separate topic
Thanks!

@x0152 x0152 marked this pull request as ready for review January 14, 2026 17:59
@patimen patimen added this to the v0.2.10 milestone Jan 29, 2026
@patimen patimen changed the base branch from main to upgrade-v0.2.10 February 3, 2026 23:36

@patimen patimen left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm really not sure about this PR.
There is some interaction between clients and caches here that could be a big problem. I called one out, I fear there may be more.
Generally speaking, we would want a cache as a gRPC Interceptor, so the cache is tied to a long lived connection vs a global cache that could come up and bite us some time, say, if we have multiple connections.
I do like the idea, and I can see how this could be very useful. But I think we need to maybe step back a bit and think harder about how to do this.

return &CachingConn{inner: inner}
}

func (c *CachingConn) Invoke(ctx context.Context, method string, args any, reply any, opts ...grpc.CallOption) error {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't the grpc.CallOptions be passed on to the CachedInvoke?

return "", err
}
hash := sha256.Sum256(data)
return method + ":" + hex.EncodeToString(hash[:16]), nil

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why the truncation?

}

// Clear cache only when synced to avoid thrashing during catch-up
cosmosclient.ClearCache()

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is clearing the cache AFTER we have tried to get the networkInfo above... meaning that it will be info from the PREVIOUS block, not the new one.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Specifically, take a look at line 170 (queryNetworkInfo)

@patimen

patimen commented Feb 5, 2026

Copy link
Copy Markdown
Contributor

/run-integration

@patimen patimen modified the milestones: v0.2.10, v0.2.11 Feb 5, 2026
@x0152 x0152 marked this pull request as draft February 9, 2026 20:46
@x0152

x0152 commented Feb 9, 2026

Copy link
Copy Markdown
Collaborator Author

@patimen Thanks for the review!

The safest cache idea is to make it height‑scoped and explicit: pass the height in context, then cache only for that height. That way every request clearly says which block it belongs to, and the developer controls it. Example:

ctx := context.WithValue(ctx, HeightKey, height) // or attach height to request metadata
resp, err := queryClient.Inference(ctx, req)     // cache uses ctx height

But I want to ask about the current solution - I rewrote it to a simpler and less buggy version that is easier to read, but it still uses a global height‑pinned cache. If we talk to a single node and single backend URL, what risks do you still see with a shared cache? I only see risks if we ever talk to different nodes or have a load balancer/proxy between the dapi and the node, which is not the case for us right now

What do you think?

@x0152 x0152 marked this pull request as ready for review February 9, 2026 21:27
@x0152

x0152 commented Feb 13, 2026

Copy link
Copy Markdown
Collaborator Author

@akup Thanks, this is a good point.

I agree that we should use x-cosmos-block-height from the response as the source of truth for cache writes, and I will add this to current implementation. In load balanced or multi node setups, some edge cases can still happen

P.S.
Also, I suggested Grafana support in PR #721. If we add cache hit/miss and cache-related error metrics, we can clearly see whether this gives real benefit in production

@x0152 x0152 force-pushed the perf/grpc-rest-caching branch from c1a5e71 to 09f1a8a Compare February 13, 2026 17:35
@gmorgachev gmorgachev deleted the branch gonka-ai:upgrade-v0.2.14 February 20, 2026 08:48
@gmorgachev gmorgachev closed this Feb 20, 2026
@github-project-automation github-project-automation Bot moved this from Todo to Done in Upgrade v0.2.11 Feb 20, 2026
@gmorgachev gmorgachev reopened this Feb 20, 2026
@tcharchian tcharchian moved this from Done to Needs reviewer in Upgrade v0.2.11 Feb 21, 2026
tamazgadaev and others added 12 commits March 16, 2026 22:10
* validation scripts and readmes

Made-with: Cursor

* update benchmark code

* update inference to multiling datasets

* dont keep it in main

* logprobs processed

* update

* rmeove logprobs mode

---------

Co-authored-by: Gleb Morgachev <morgachev.g@gmail.com>
Co-authored-by: Tamaz Gadaev <gadaev.tamaz@gmail.com>
* refactor(subnet): simplify host, user, and state machine internals

Round 1: remove user-side workarounds (expectedRoots, InferenceResult,
SendPrepared, NextDiff, two-pass filtering).

Round 2: guard warm key snapshot behind storage check, collapse
executeJob into subnet.ExecuteRequest, simplify challengeReceiptLocked
dedup pattern, remove redundant addressInGroup field from state machine.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(subnet): use ModifyRequestBody in validation to match execution params

Validation re-execution used the stored original prompt without applying
ModifyRequestBody, so logprobs, seed, and max_tokens were missing. This
caused empty validation logits (lengthValidation=0) for every inference.

* fix: avoid re-submitting ConirmStart

* idempotent state machine

* Long SSE?

* response writer

* WIP

* Fixes

* refactor(subnet): replace proxy retry loops with send-once-retry-once

Under load (50 concurrent inferences), the retry loop DDoSes hosts with
650+ retries over 65s. Replace with two attempts max: send, wait for
deadline, retry once, then timeout. Adds catch-up diffs to timeout
verification so verifiers know about the inference. Adds RWMutex to
StateMachine to prevent concurrent map crashes. Fixes flaky
GossipRecovery test that raced with async gossip propagation.

* refactor(subnet): replace filterPendingTxs with state machine best-effort apply

filterPendingTxs reimplemented state machine validation rules as a
parallel filter layer. ApplyLocalBestEffort makes the state machine
the single source of truth -- stale txs are skipped by applyTx itself,
zero snapshot overhead since all handlers are check-first-mutate-last.

Also extracts sendDiffRound/sendCatchUp from Finalize and collapses
executeAsync into RunExecution.

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix(subnet): map TokenPrice in ChainBridge.GetEscrow

ChainBridge.GetEscrow was not mapping TokenPrice from the chain query
response, causing the server to default to 0. SessionConfigWithPrice
treated 0 as 1, diverging from the chain's actual TokenPrice and
producing state root mismatches on the first MsgStartInference.

* refactor(subnet): tidy diagnostic logging for state machine and inference handler

State machine: keep NewStateMachine Info log (fires once) and state root
mismatch Error log (fires on errors). Restore ApplyLocalBestEffort to
Debug level with enriched fields, drop redundant root hex dump.

Server: remove per-request Info logs added during debugging (sender,
body, receipt, breadcrumbs). Keep Error logs on actual error paths.

* fix(upgrade): set subnet escrow group size to 16 in v0.2.11

* fix(api): prevent blocking on external calls in subnet and validation

- Use singleflight in HostManager.getOrCreate to avoid holding global
  lock during gRPC calls (stuck calls would block all session requests)
- Add 30s timeout to payload retrieval HTTP client

* fix(subnet): reject validation when claimed token counts exceed re-execution usage

* fix(api): return false when event value index is out of bounds

Previously, when an event in a batched transaction was missing an attribute,
getEventValue would silently return the first event's value. This caused
wrong metadata to be associated with subsequent records. Now it correctly
returns false so callers can detect the missing attribute and fail safely.
Co-authored-by: Gleb Morgachev <morgachev.g@gmail.com>
* biunty-distribution

* add bounty

add one missing bounty for collateral slashing

---------

Co-authored-by: Gleb Morgachev <morgachev.g@gmail.com>
@x0152 x0152 force-pushed the perf/grpc-rest-caching branch from 09f1a8a to 7a73c9c Compare March 19, 2026 10:35
@x0152

x0152 commented Mar 19, 2026

Copy link
Copy Markdown
Collaborator Author

@x0152 Please clean the commits of this PR

Only 3 commits are actually about gRPC caching: 3e41e57 — perf: add LRU cache for gRPC queries (the core feature) 4764bbc — fix: height-pinned query cache 09f1a8a — fix: cache by response block height

There are a lot of already merged unrelated commits at this PR and it is hard to review

Done: rebased and cleaned

@tcharchian tcharchian removed this from the v0.2.12 milestone Mar 21, 2026
@github-project-automation github-project-automation Bot moved this to New in Triage Apr 24, 2026
@patimen

patimen commented Apr 28, 2026

Copy link
Copy Markdown
Contributor

Can we move on this now? It's gone through a good number of iterations and is looking close, and could definitely still help perf.

@tcharchian tcharchian moved this from New to Accepted in Triage Apr 28, 2026
@tcharchian tcharchian added this to the v0.2.13 milestone Apr 28, 2026
@tcharchian

Copy link
Copy Markdown
Collaborator

@x0152 @akup hey, should this be included in v0.2.13?

@x0152

x0152 commented Apr 29, 2026

Copy link
Copy Markdown
Collaborator Author

Yes, let's take this in v0.2.13

@x0152 x0152 marked this pull request as draft May 3, 2026 12:22
@tcharchian

Copy link
Copy Markdown
Collaborator

@x0152 are you ready to open this PR and rebase it for the next upgrade?

@x0152 x0152 changed the base branch from upgrade-v0.2.11 to upgrade-v0.2.14 May 28, 2026 12:52
@x0152

x0152 commented May 28, 2026

Copy link
Copy Markdown
Collaborator Author

#1272

@x0152 x0152 closed this May 28, 2026
@github-project-automation github-project-automation Bot moved this from Accepted to Archived / Closed in Triage May 28, 2026
@x0152 x0152 mentioned this pull request May 28, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: Archived / Closed

Development

Successfully merging this pull request may close these issues.