fix(inference): refund client escrow for VOTING inferences on timeout (quorum miss)#1275
Open
vitaly-andr wants to merge 1 commit into
Open
fix(inference): refund client escrow for VOTING inferences on timeout (quorum miss)#1275vitaly-andr wants to merge 1 commit into
vitaly-andr wants to merge 1 commit into
Conversation
expireInferences filtered timed-out inferences by Status == STARTED only. A failing MsgValidation moves an inference to VOTING; if the resulting x/group proposals miss quorum within the voting window, timeout cleanup silently skipped it -- the InferenceTimeout entry was removed, the inference stayed VOTING forever, and the client's escrow was stranded in the inference module account. Add a VOTING branch: on timeout, mark the inference EXPIRED and refund the client via expireInferenceAndIssueRefund (default-to-refund on quorum miss). The x/group proposals are pruned by x/group EndBlock. Adds TestExpireInferences_VotingInferenceRefundedOnTimeout, which drives the real expireInferences (via an export_test.go wrapper) and fails on current main (no refund, inference stays VOTING), passing with the fix. Surfaced by simulation/fuzz testing for inference-chain (gonka-ai#982). Closes gonka-ai#1265. Signed-off-by: Vitaly Andrianov <vitaly.andr@gmail.com>
vitaly-andr
added a commit
to vitaly-andr/gonka
that referenced
this pull request
May 29, 2026
…zzing, secondary-op factories Completes gonka-ai#982 Phase 3 (x/inference hardening) on top of the second-wave ops: - Custom invariants via RegisterInvariants (bank-backs-positive-balance solvency, no-stuck-voting, effective-epoch-fresh, active-invalidations-ref-live) + unit tests. - Store decoders for the simulation decode harness (Participant / Inference / KeySet / Uint64 / Int64 / legacy-params, hex fallback) + tests. - Aggressive parameter-boundary fuzzing (MutateSimParams): economic params to Validate() boundaries, widened EpochParams, corner profiles. Slashing- activation levers held at defaults — active slashing drains the Tokens=1 sim substrate. - Three real secondary-op factories (SubmitUnitOfComputePriceProposal, SubmitHardwareDiff, SubmitNewUnfundedParticipant) replacing NoOp stubs. - Data-driven weight tuning; a model-weight transient-cache test helper. - docs/simulation.md: Phase-3 status, fuzzing design, invariant list, sweep. The full 500-block sim run is green only with the production fixes it surfaced: gonka-ai#1275 (VOTING-timeout refund), gonka-ai#1276 (revalidation membership guard), and gonka-ai/cosmos-sdk#16 (markValidatorForDeletion). Verified combined: sim-full 500x200 seed=99 PASS, app-hash f76df5cd…. The post-run bank-backs invariant holds in the canonical 500x200 run (transient intra-epoch under-backing heals via debt-recovery by settlement); the asymmetric refund-debit drift it surfaces under shorter/aggressive runs is documented as a design question in gonka-ai#1273 (TestRefundInvalidation_LeavesAccountingDrift). Signed-off-by: Vitaly Andrianov <vitaly.andr@gmail.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
expireInferences(x/inference/module/module.go) filtered timed-out inferences byStatus == STARTEDonly. A failingMsgValidationmoves an inference toVOTING; if its x/group proposals miss quorum within the voting window, the timeout cleanup silently skips it — theInferenceTimeoutentry is removed, the inference staysVOTINGforever, and the client's escrow is stranded in the inference module account. The quorum-miss trigger (network lag, validator restart, split vote) is a routine production condition, not an exotic edge case.Fix
Add a
VOTINGbranch: on timeout, mark the inferenceEXPIREDand refund the client viaexpireInferenceAndIssueRefund(default-to-refund on quorum miss — client made whole, executor unpaid, no slashing). The x/group proposals are left to be pruned by x/groupEndBlock.Test
TestExpireInferences_VotingInferenceRefundedOnTimeoutdrives the realexpireInferences(via anexport_test.gowrapper, since it is unexported) with oneVOTINGtimeout — it fails on currentmain(no refund, inference staysVOTING) and passes with the fix.Context
Discovered while working on #998 (maintenance windows); the reproducer was built with the inference-chain simulation/fuzz harness (#982), where a
NoStuckVotinginvariant flagged an inference stillVOTINGat epoch 10.Closes #1265.
cc @patimen (#982 author)