Skip to content

fix: gracefully skip unknown voters in signing policy build#8

Open
CosmicData wants to merge 12 commits into
flare-foundation:mainfrom
CosmicData:fix/graceful-skip-unknown-voters
Open

fix: gracefully skip unknown voters in signing policy build#8
CosmicData wants to merge 12 commits into
flare-foundation:mainfrom
CosmicData:fix/graceful-skip-unknown-voters

Conversation

@CosmicData
Copy link
Copy Markdown

@CosmicData CosmicData commented Apr 2, 2026

Summary

  • Fixes KeyError crash in reward_epoch_manager.py when a voter address in the signing policy is not found in voter registrations
  • Replaces direct dict key access (vres[spa[voter]]) with .get() and graceful skip with warning log
  • Prevents observer restart loop and continuous alert spam on Coston2 testnet

Problem

KeyError: '0xBf2c777B9Bf58feebe4834f7f603AF7c960E7cFf'
File "/app/observer/reward_epoch_manager.py", line 209, in build
    vre = vres[spa[voter]]

A validator address present in the signing policy but missing from voter registrations caused the observer to crash on every start.

Test plan

  • Deploy with Coston2 testnet RPC and verify observer starts without crash
  • Verify warning logs appear for unregistered voters instead of KeyError
  • Confirm registered voters are still processed correctly

🤖 Generated with Claude Code

CosmicData and others added 12 commits April 2, 2026 19:40
Voters present in the signing policy but missing from voter registrations
caused a KeyError crash, leading to a restart loop with continuous alerts.
Use .get() with fallback logging instead of direct key access.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Nodes without full block history fail with BlockNotFound when the
observer tries to fetch a block 1M blocks in the past. 10k blocks
is sufficient for calculating the block production rate.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Filter out VoterRemoved entries before building entity list
- Add info log with voter counts for debugging registration issues
- Use filtered lists for spa/vres/vries dicts

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Nodes with limited block history would crash when searching for
blocks ~2.5h before epoch start. Now uses binary search to find
the earliest available block as fallback, and gracefully handles
pruned blocks during the iterative timestamp refinement.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The original 2h30min..1h window was too narrow and missed voter
registration events on some networks. Expanded to 12h before epoch
start through epoch start to reliably capture all registrations.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
With the wider scan window, events from adjacent reward epochs can
appear in the results. Filter them out by checking reward_epoch_id
before adding to the builder.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
On pruned nodes, RandomAcquisitionStarted and VotePowerBlockSelected
events may not be in the available block range. Replace asserts with
warnings so the observer can still build a partial signing policy.
SigningPolicyInitialized remains required as it's essential.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
With the wider scan window, SigningPolicyInitialized events from
previous epochs appear first. The break must only trigger when the
event was actually accepted by the builder (matching reward_epoch_id).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Replace all hard dict key accesses for entity lookups with .get()
and guard entity usage with None checks. When the entity is not in
the signing policy (e.g. due to pruned registration events), the
observer continues running in degraded mode instead of crashing.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Skip extract_round_for_entity, validate_round, and submit metrics
recording when entity is None to prevent AttributeError crashes.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant