Skip to content

Conversation

@stablebits
Copy link
Owner

The STREAM_LOAD_EMA_INTERVAL_COUNT constant controls the duration of the EMA smoothing window used to reduce sensitivity to short-lived load spikes at the start of a leader slot. With anza-xyz#9580 in place, throttling is only triggered when saturation is sustained (reaching 95% of max target).

Problem

With 10, the duration of the smoothing window is too short (see the simulation results below).

Summary of Changes

The value 40 was chosen based on simulations: at a max target TPS of ~400K, it allows the system to absorb a burst of ~50K transactions over ~40 ms before throttling activates.

There is no magic about N=40; the value should be tuned based on the size and duration of spikes we want to tolerate.

This choice was made based on simulations: the alpha in the EMA (new_ema = alpha * latest + (1 - alpha) * ema) is basically 2/(N+1), where N is STREAM_LOAD_EMA_INTERVAL_COUNT.
The larger N is, the slower the EMA grows (i.e., the larger a burst it can absorb). With N=10 (current code), alpha ≈ 0.18. For example, here’s the EMA growth under sustained load of 1K / 5ms.

N=10 (alpha ≈ 0.18)

        step  load_in_5ms          ema
           0         1000          181
           1         1000          329
           2         1000          450
           3         1000          549
           4         1000          630
           5         1000          697
           6         1000          752
           7         1000          797
           8         1000          833
           9         1000          863

N=40 (alpha ≈ 0.047)

        step  load_in_5ms          ema
           0         1000           47
           1         1000           92
           2         1000          135
           3         1000          176
           4         1000          215
           5         1000          252
           6         1000          287
           7         1000          321
           8         1000          353
           9         1000          383

Below is simulated ingestion of ~60K transactions over 100ms with a spike at the beginning -- roughly corresponding to a pattern we recently saw on mds1 (mainnet), but at about 10x more traffic.
Note: throttling is activated at 95% of the target (500K TPS) load and deactivated at 90%). The quota of 40K basically means unthrottled.

N=10

Running `target/debug/ema_sim 5000 15000 1000 3000 4000 7000 5000 5000 3000 5000 1000 2000 1000 1000 1000 1000 1000 1000 1000 1000 --stakes 1,10,100 --total-stake 10000`
# max_streams_per_ms=500 max_unstaked_connections=500 max_staked_load_in_throttling_window=40000 max_unstaked_load_in_throttling_window=20 throttling_on_threshold=1900
        step  load_in_5ms          ema  quota_0.01%   quota_0.1%     quota_1%
           0         5000          908        40000        40000        40000
           1        15000         3467           21           40          400
           2         1000         3018           21           40          400
           3         3000         3014           21           40          400
           4         4000         3193           21           40          400
           5         7000         3884           21           40          400
           6         5000         4086           21           40          400
           7         5000         4252           21           40          400
           8         3000         4024           21           40          400
           9         5000         4201           21           40          400
          10         1000         3619           21           40          400
          11         2000         3324           21           40          400
          12         1000         2901           21           40          400
          13         1000         2555           21           40          400
          14         1000         2272           21           40          400
          15         1000         2040           21           40          400
          16         1000         1851           21           40          400
          17         1000         1696        40000        40000        40000
          18         1000         1569        40000        40000        40000
          19         1000         1465        40000        40000        40000

N=40

# max_streams_per_ms=500 max_unstaked_connections=500 max_staked_load_in_throttling_window=40000 max_unstaked_load_in_throttling_window=20 throttling_on_threshold=1900
        step  load_in_5ms          ema  quota_0.01%   quota_0.1%     quota_1%
           0         5000          239        40000        40000        40000
           1        15000          945        40000        40000        40000
           2         1000          947        40000        40000        40000
           3         3000         1045        40000        40000        40000
           4         4000         1186        40000        40000        40000
           5         7000         1464        40000        40000        40000
           6         5000         1633        40000        40000        40000
           7         5000         1794        40000        40000        40000
           8         3000         1851        40000        40000        40000
           9         5000         2001           21           40          400
          10         1000         1953           21           40          400
          11         2000         1955           21           40          400
          12         1000         1909           21           40          400
          13         1000         1865           21           40          400
          14         1000         1823           21           40          400
          15         1000         1783        40000        40000        40000
          16         1000         1745        40000        40000        40000
          17         1000         1709        40000        40000        40000
          18         1000         1675        40000        40000        40000
          19         1000         1642        40000        40000        40000

With N=40, we can absorb ~50K transactions (with a spike) over ~40ms before throttling gets activated.

Fixes #

brooksprumo and others added 26 commits January 14, 2026 19:36
* Add wfsm metric. Add trace logging for peers.

* Remove trace logging, since peers are already logged by gossip

* Remove wrong_shred_stake from wfsm_gossip metric. This will always be 0 and the associated code will be cleaned up in a future PR
* Split update_index function into two: one for cached accounts and the other for frozen

* Updated comment and added debug asserts

* Remove unneeded type declaration
alpenglow: upstream votor & votor-messages as of December
* Make flushing of unrooted slots explicit

* Rename flush_unrooted_cache_slot to flush_unrooted_slot_cache

* Checking for unrooted slots

Removing changes to tests that are not using new function

* Inlining flush_slot_cache to flush_accounts_cache_slot_for_tests for DCOU issue

* Resolving unused function issue
* bump `bls-signatures` to v3.0

* update vote program with the new syntax

* update genesis-utils with the new syntax

* update `clap-utils` tests

* update `keygen` tests

* update genesis tests

* update votor

* update votor tests

* update `epoch_stakes`
…ams/sbf (anza-xyz#10029)

chore(deps): bump solana-program-memory in /programs/sbf

Bumps [solana-program-memory](https://github.com/anza-xyz/solana-sdk) from 3.0.0 to 3.1.0.
- [Release notes](https://github.com/anza-xyz/solana-sdk/releases)
- [Commits](https://github.com/anza-xyz/solana-sdk/compare/sdk@v3.0.0...cpi@v3.1.0)

---
updated-dependencies:
- dependency-name: solana-program-memory
  dependency-version: 3.1.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
* chore(deps): bump chrono from 0.4.42 to 0.4.43

Bumps [chrono](https://github.com/chronotope/chrono) from 0.4.42 to 0.4.43.
- [Release notes](https://github.com/chronotope/chrono/releases)
- [Changelog](https://github.com/chronotope/chrono/blob/main/CHANGELOG.md)
- [Commits](chronotope/chrono@v0.4.42...v0.4.43)

---
updated-dependencies:
- dependency-name: chrono
  dependency-version: 0.4.43
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>

* Update all Cargo files

---------

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
…yz#10041)

* chore(deps): bump solana-system-interface from 2.0.0 to 3.0.0

Bumps [solana-system-interface](https://github.com/anza-xyz/solana-sdk) from 2.0.0 to 3.0.0.
- [Release notes](https://github.com/anza-xyz/solana-sdk/releases)
- [Commits](https://github.com/anza-xyz/solana-sdk/compare/address@v2.0.0...sdk@v3.0.0)

---
updated-dependencies:
- dependency-name: solana-system-interface
  dependency-version: 3.0.0
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>

* Update all Cargo files

---------

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
…z#10002)

* epoch stakes in thread

* Add comment and asserts for versioned_epoch_stakes
…-bins (anza-xyz#10049)

chore(deps): bump solana-system-interface in /dev-bins

Bumps [solana-system-interface](https://github.com/anza-xyz/solana-sdk) from 2.0.0 to 3.0.0.
- [Release notes](https://github.com/anza-xyz/solana-sdk/releases)
- [Commits](https://github.com/anza-xyz/solana-sdk/compare/address@v2.0.0...sdk@v3.0.0)

---
updated-dependencies:
- dependency-name: solana-system-interface
  dependency-version: 3.0.0
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
…nza-xyz#10048)

decrease QUIC_MAX_TIMEOUT from 60s to 30s

60s timeout might be too hight due to NAT timeouts. The 30sec is safe default idle timeout and used as default in quinn.
scale RX window and max_streams with BDP
anza-xyz#9580)

Simulations with the existing EMA-based load metric (stream_throttle.rs) showed that
very low-stake staked connections (~0.01% of total stake) could end up with
streams-per-100ms quotas similar to unstaked connections even under near-zero load.

Data collected on mds1 (mainnet) over a few leader slots also showed low-stake connections
being throttled under effectively idle conditions:
[2025-12-04T22:56:59.929547468Z ERROR solana_streamer::nonblocking::stream_throttle] Throttling tpu stream from 3.66.188.50:8016, peer type: Staked(30314578869242),
current_load: 11, total_stake: 415746706271632896, max_streams_per_interval: 28, read_interval_streams: 28, throttle_duration: 99.948899ms

In observed cases, effective load was near 0 (3–25 streams per 5ms) while affected connections had quotas of 28–64 streams per 100ms and stakes of ~0.007–0.016% of
total stake.

Also:
- Fix update_ema() catch-up behavior so missed slots do not re-apply the same
  accumulated load.
- available_load_capacity_in_throttling_duration() mixed load values in streams/5ms and streams/50ms. Replaced it with a simpler
  stake-only quota under load.
* Prepopulate zero lamport accounts in store_for_tests

* Update accounts-db/src/accounts_db.rs

Co-authored-by: Brooks <brooks@prumo.org>

---------

Co-authored-by: Brooks <brooks@prumo.org>
* bump `zk-sdk` to `v5.0`

* update zk-elgamal-proof tests

* re-key feature `reenable_zk_elgamal_proof_program`
* chore(deps): bump flate2 from 1.0.31 to 1.1.8 in /programs/sbf

Bumps [flate2](https://github.com/rust-lang/flate2-rs) from 1.0.31 to 1.1.8.
- [Release notes](https://github.com/rust-lang/flate2-rs/releases)
- [Commits](rust-lang/flate2-rs@1.0.31...1.1.8)

---
updated-dependencies:
- dependency-name: flate2
  dependency-version: 1.1.8
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>

* Update all Cargo files

---------

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Bumps [js-sys](https://github.com/wasm-bindgen/wasm-bindgen) from 0.3.83 to 0.3.85.
- [Release notes](https://github.com/wasm-bindgen/wasm-bindgen/releases)
- [Changelog](https://github.com/wasm-bindgen/wasm-bindgen/blob/main/CHANGELOG.md)
- [Commits](https://github.com/wasm-bindgen/wasm-bindgen/commits)

---
updated-dependencies:
- dependency-name: js-sys
  dependency-version: 0.3.85
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
This constant controls the duration of the EMA smoothing window used to
reduce sensitivity to short-lived load spikes at the start of a leader
slot. Throttling is only triggered when saturation is sustained.

The value 40 was chosen based on simulations: at a max target TPS of ~400K,
it allows the system to absorb a burst of ~50K transactions over ~40 ms
before throttling activates.

There is no magic about N=40; the value should be tuned based on the size
and duration of spikes we want to tolerate.
@stablebits stablebits force-pushed the increase-stream_load_ema_interval_count branch from 6fc5d7c to 98486db Compare January 16, 2026 13:59
kskalski and others added 2 commits January 16, 2026 14:20
* Switch networking crates to Rust 2024 edition

* clippy(networking): update formatting for rust 2024

* clippy: fix collapsible ifs
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.