Skip to content

feat(driver): do not schedule the same timeout twice#1501

Open
cason wants to merge 7 commits intocirclefin:mainfrom
cason:duplicated-timeouts
Open

feat(driver): do not schedule the same timeout twice#1501
cason wants to merge 7 commits intocirclefin:mainfrom
cason:duplicated-timeouts

Conversation

@cason
Copy link
Contributor

@cason cason commented Feb 24, 2026

Closes: #1500

As per title.

This can happen to several reasons, but mostly because TimeoutPrecommit is re-scheduled at every round step change.

The solution consists on creating a scheduled_timeouts vector at the driver, and only adding a new timeout to it if not already present. The vector is cleaned upon a new height, and old entries are removed upon a new height.

@cason cason force-pushed the duplicated-timeouts branch from 652ed50 to 4d031cf Compare February 24, 2026 15:05
@github-actions

This comment was marked as outdated.

@github-actions github-actions bot added the need-triage This issue needs to be triaged label Feb 24, 2026
@github-actions github-actions bot closed this Feb 24, 2026
@romac romac reopened this Feb 24, 2026
@romac romac removed the need-triage This issue needs to be triaged label Feb 24, 2026
if timeout.round < self.round() || self.scheduled_timeouts.contains(&timeout) {
return;
}
// XXX: test if the driver produces **non**-consensus timeouts
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should I change this to a warning or something like that? Apparently, we don't panic - at least in the test that have ran.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would perhaps change this to a debug_assert!

/// The certificate that justifies moving to the `enter_round` specified in the `EnterRoundCertificate.
pub round_certificate: Option<EnterRoundCertificate<Ctx>>,

scheduled_timeouts: Vec<Timeout>,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's add a doc comment here to explain this field's purpose

if timeout.round < self.round() || self.scheduled_timeouts.contains(&timeout) {
return;
}
// XXX: test if the driver produces **non**-consensus timeouts
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would perhaps change this to a debug_assert!

cason and others added 2 commits February 25, 2026 23:24
Co-authored-by: Romain Ruetschi <github@romac.me>
@cason cason requested a review from romac February 25, 2026 22:29
Copy link
Contributor

@nenadmilosevic95 nenadmilosevic95 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! Left two minor comments. In general, I like the driver-level dedup — simple and effective. IMHO the state machine check might be even cleaner, since the Tendermint pseudo-code states "for the first time" and the state machine should align with it, but this works well too.

}

fn lift_timeout_output(&mut self, timeout: Timeout, outputs: &mut Vec<Output<Ctx>>) {
if timeout.round < self.round() || self.scheduled_timeouts.contains(&timeout) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor: Can this ever be true timeout.round < self.round()? Timeouts should only be for the current round. If purely defensive, I'd add a warn! here so we notice if it ever triggers.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point.


// Remove useless timeouts from previous rounds
self.scheduled_timeouts
.retain(|timeout| timeout.round >= round);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar here, could this just be clear()? It is minor ofc, I just want to understand if we expect to see any future rounds here, since I don't see how

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also a good point. But the concern here is to limit the growth of this set.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we can schedule timeouts for future rounds - otherwise, clear() is the way to go.

@cason
Copy link
Contributor Author

cason commented Feb 27, 2026

IMHO the state machine check might be even cleaner, since the Tendermint pseudo-code states "for the first time" and the state machine should align with it, but this works well too.

I will try to implement this version.

@cason
Copy link
Contributor Author

cason commented Feb 27, 2026

IMHO the state machine check might be even cleaner, since the Tendermint pseudo-code states "for the first time" and the state machine should align with it, but this works well too.

I will try to implement this version.

#1508

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

fix(driver): Prevent scheduling identical timeouts

3 participants