Skip to content

fix: use deterministic seed for ClaimRewards reservoir sampling#1254

Open
ouicate wants to merge 3 commits into
gonka-ai:mainfrom
ouicate:fix/claim-rewards-deterministic-sampling
Open

fix: use deterministic seed for ClaimRewards reservoir sampling#1254
ouicate wants to merge 3 commits into
gonka-ai:mainfrom
ouicate:fix/claim-rewards-deterministic-sampling

Conversation

@ouicate

@ouicate ouicate commented May 26, 2026

Copy link
Copy Markdown

Summary

Replace block-hash-seeded RNG with msg.Seed for reservoir sampling in getMustBeValidatedInferences. The block hash changes per block, which allowed retry grinding: a validator could re-submit ClaimRewards in different blocks until a favorable sample under-represented their missed validations and the statistical test passed.

msg.Seed is signed by the validator (verified against the stored SeedSignature) and cannot be changed between retries, making the reservoir sample deterministic per validator per epoch regardless of which block includes the claim transaction.

Test plan

  • go test ./x/inference/keeper/... -run 'ClaimRewards|Reservoir|Sampling'

Supersedes #830 (closed because the head fork was accidentally deleted).

Replace block-hash-seeded RNG with msg.Seed for reservoir sampling in
getMustBeValidatedInferences. The block hash changes per block, which
allowed retry grinding: a validator could re-submit ClaimRewards in
different blocks until a favorable sample under-represented their missed
validations and the statistical test passed.

msg.Seed is signed by the validator (verified against the stored
SeedSignature) and cannot be changed between retries, making the
reservoir sample deterministic per validator per epoch regardless of
which block includes the claim transaction.
@a-kuprin

Copy link
Copy Markdown
Collaborator

Generally ClaimRewards that follows seed reveal stage is subject to be legacy and deprecated in the future.
I'm not sure we should continue fixing it

@ouicate

ouicate commented May 30, 2026

Copy link
Copy Markdown
Author

Of course it should be fixed.

Something being planned for deprecation is not the same as being deprecated.

Saying "we might remove it later" is not a valid reason to leave a known issue unfixed today. If you're moving to a new house in six months but the roof is leaking now, you still fix the roof. The future move doesn't eliminate the current problem.

Until ClaimRewards is actually removed, vulnerabilities in that code path remain relevant and should be treated accordingly.

@tcharchian

Copy link
Copy Markdown
Collaborator

Of course it should be fixed.

Something being planned for deprecation is not the same as being deprecated.

Saying "we might remove it later" is not a valid reason to leave a known issue unfixed today. If you're moving to a new house in six months but the roof is leaking now, you still fix the roof. The future move doesn't eliminate the current problem.

Until ClaimRewards is actually removed, vulnerabilities in that code path remain relevant and should be treated accordingly.

@ouicate I want to clarify the prioritization here.

The point is not that every issue in code planned for future deprecation should automatically be ignored. The point is that prioritization depends on actual impact, likelihood, attacker incentive, and whether fixing this code path is still worth the review and maintenance cost.

For this specific ClaimRewards issue, the impact appears low. As I understand it, the most realistic effect is that an individual participant with a high validation miss rate may be able to improve their own threshold outcome after many attempts. There is no meaningful network-wide effect, no significant payment loss for other participants, and no effect on weight.

The likelihood also appears low. The scenario only becomes relevant when a participant is already in a fairly specific position: roughly between one and two validation-mismatch thresholds. That is a narrow edge case, not a generally exploitable network risk.

And most importantly, the attacker incentive is weak. A dishonest node does not really need this attack path, because it can simply record VALIDATED for everything. So the proposed exploit does not seem to create a meaningful new capability or a materially better attack strategy.

So I do not agree with the framing that “it is a known issue, therefore of course it should be fixed.” That is not how protocol engineering prioritization works. Some known issues are real but still low priority, especially when they affect legacy code, have limited practical impact, low likelihood, and weak attacker incentive.

Given that @a-kuprin already pointed out that this path is likely to be deprecated, I think it is reasonable to question whether this PR is worth prioritizing, reviewing, merging, or rewarding.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants