perf: gRPC caching#542
Conversation
d530e40 to
8b2615c
Compare
e5b5402 to
3e41e57
Compare
You're right Removed http caching for now - that's a separate topic |
patimen
left a comment
There was a problem hiding this comment.
I'm really not sure about this PR.
There is some interaction between clients and caches here that could be a big problem. I called one out, I fear there may be more.
Generally speaking, we would want a cache as a gRPC Interceptor, so the cache is tied to a long lived connection vs a global cache that could come up and bite us some time, say, if we have multiple connections.
I do like the idea, and I can see how this could be very useful. But I think we need to maybe step back a bit and think harder about how to do this.
| return &CachingConn{inner: inner} | ||
| } | ||
|
|
||
| func (c *CachingConn) Invoke(ctx context.Context, method string, args any, reply any, opts ...grpc.CallOption) error { |
There was a problem hiding this comment.
Shouldn't the grpc.CallOptions be passed on to the CachedInvoke?
| return "", err | ||
| } | ||
| hash := sha256.Sum256(data) | ||
| return method + ":" + hex.EncodeToString(hash[:16]), nil |
| } | ||
|
|
||
| // Clear cache only when synced to avoid thrashing during catch-up | ||
| cosmosclient.ClearCache() |
There was a problem hiding this comment.
This is clearing the cache AFTER we have tried to get the networkInfo above... meaning that it will be info from the PREVIOUS block, not the new one.
There was a problem hiding this comment.
Specifically, take a look at line 170 (queryNetworkInfo)
|
/run-integration |
|
@patimen Thanks for the review! The safest cache idea is to make it height‑scoped and explicit: pass the height in context, then cache only for that height. That way every request clearly says which block it belongs to, and the developer controls it. Example: ctx := context.WithValue(ctx, HeightKey, height) // or attach height to request metadata
resp, err := queryClient.Inference(ctx, req) // cache uses ctx heightBut I want to ask about the current solution - I rewrote it to a simpler and less buggy version that is easier to read, but it still uses a global height‑pinned cache. If we talk to a single node and single backend URL, what risks do you still see with a shared cache? I only see risks if we ever talk to different nodes or have a load balancer/proxy between the dapi and the node, which is not the case for us right now What do you think? |
|
@akup Thanks, this is a good point. I agree that we should use x-cosmos-block-height from the response as the source of truth for cache writes, and I will add this to current implementation. In load balanced or multi node setups, some edge cases can still happen P.S. |
c1a5e71 to
09f1a8a
Compare
* validation scripts and readmes Made-with: Cursor * update benchmark code * update inference to multiling datasets * dont keep it in main * logprobs processed * update * rmeove logprobs mode --------- Co-authored-by: Gleb Morgachev <morgachev.g@gmail.com>
Co-authored-by: Tamaz Gadaev <gadaev.tamaz@gmail.com>
* refactor(subnet): simplify host, user, and state machine internals Round 1: remove user-side workarounds (expectedRoots, InferenceResult, SendPrepared, NextDiff, two-pass filtering). Round 2: guard warm key snapshot behind storage check, collapse executeJob into subnet.ExecuteRequest, simplify challengeReceiptLocked dedup pattern, remove redundant addressInGroup field from state machine. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix(subnet): use ModifyRequestBody in validation to match execution params Validation re-execution used the stored original prompt without applying ModifyRequestBody, so logprobs, seed, and max_tokens were missing. This caused empty validation logits (lengthValidation=0) for every inference. * fix: avoid re-submitting ConirmStart * idempotent state machine * Long SSE? * response writer * WIP * Fixes * refactor(subnet): replace proxy retry loops with send-once-retry-once Under load (50 concurrent inferences), the retry loop DDoSes hosts with 650+ retries over 65s. Replace with two attempts max: send, wait for deadline, retry once, then timeout. Adds catch-up diffs to timeout verification so verifiers know about the inference. Adds RWMutex to StateMachine to prevent concurrent map crashes. Fixes flaky GossipRecovery test that raced with async gossip propagation. * refactor(subnet): replace filterPendingTxs with state machine best-effort apply filterPendingTxs reimplemented state machine validation rules as a parallel filter layer. ApplyLocalBestEffort makes the state machine the single source of truth -- stale txs are skipped by applyTx itself, zero snapshot overhead since all handlers are check-first-mutate-last. Also extracts sendDiffRound/sendCatchUp from Finalize and collapses executeAsync into RunExecution. --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix(subnet): map TokenPrice in ChainBridge.GetEscrow ChainBridge.GetEscrow was not mapping TokenPrice from the chain query response, causing the server to default to 0. SessionConfigWithPrice treated 0 as 1, diverging from the chain's actual TokenPrice and producing state root mismatches on the first MsgStartInference. * refactor(subnet): tidy diagnostic logging for state machine and inference handler State machine: keep NewStateMachine Info log (fires once) and state root mismatch Error log (fires on errors). Restore ApplyLocalBestEffort to Debug level with enriched fields, drop redundant root hex dump. Server: remove per-request Info logs added during debugging (sender, body, receipt, breadcrumbs). Keep Error logs on actual error paths. * fix(upgrade): set subnet escrow group size to 16 in v0.2.11 * fix(api): prevent blocking on external calls in subnet and validation - Use singleflight in HostManager.getOrCreate to avoid holding global lock during gRPC calls (stuck calls would block all session requests) - Add 30s timeout to payload retrieval HTTP client * fix(subnet): reject validation when claimed token counts exceed re-execution usage * fix(api): return false when event value index is out of bounds Previously, when an event in a batched transaction was missing an attribute, getEventValue would silently return the first event's value. This caused wrong metadata to be associated with subsequent records. Now it correctly returns false so callers can detect the missing attribute and fail safely.
Co-authored-by: Gleb Morgachev <morgachev.g@gmail.com>
* biunty-distribution * add bounty add one missing bounty for collateral slashing --------- Co-authored-by: Gleb Morgachev <morgachev.g@gmail.com>
09f1a8a to
7a73c9c
Compare
Done: rebased and cleaned |
|
Can we move on this now? It's gone through a good number of iterations and is looking close, and could definitely still help perf. |
|
Yes, let's take this in v0.2.13 |
|
@x0152 are you ready to open this PR and rebase it for the next upgrade? |
Add grpc query caching with per block invalidation. Even with ~6s block time, it reduces chain node load by deduplicating queries within a block.
Cache overhead is ~200ns vs ~3ms for gRPC calls.