Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[execution window] further optimizations #15053

Draft
wants to merge 20 commits into
base: brian/exec-window-exec-optimize-2
Choose a base branch
from

Conversation

bchocho
Copy link
Contributor

@bchocho bchocho commented Oct 23, 2024

  • Add window and block retrieval to block_tree, block_store and persistent_liveness_storage. Validator should be able to retrieve blocks and recover with blocks up to window start of the next block.
  • Execute the block window. Doesn't have any optimizations.
  • Filter committed transactions. Up until round-1 in prepare phase, and round-1 in execution phase. This requires using blocking txn provider to quickly provide shuffled txns in the execution phase.
  • incremental update of duplicate counts

Description

How Has This Been Tested?

Key Areas to Review

Type of Change

  • New feature
  • Bug fix
  • Breaking change
  • Performance improvement
  • Refactoring
  • Dependency update
  • Documentation update
  • Tests

Which Components or Systems Does This Change Impact?

  • Validator Node
  • Full Node (API, Indexer, etc.)
  • Move/Aptos Virtual Machine
  • Aptos Framework
  • Aptos CLI/SDK
  • Developer Infrastructure
  • Move Compiler
  • Other (specify)

Checklist

  • I have read and followed the CONTRIBUTING doc
  • I have performed a self-review of my own code
  • I have commented my code, particularly in hard-to-understand areas
  • I identified and added all stakeholders and component owners affected by this change as reviewers
  • I tested both happy and unhappy path of the functionality
  • I have made corresponding changes to the documentation

Copy link

trunk-io bot commented Oct 23, 2024

⏱️ 2h 3m total CI duration on this PR
Slowest 15 Jobs Cumulative Duration Recent Runs
test-target-determinator 28m 🟩🟩🟩🟩🟩 (+1 more)
rust-images / rust-all 14m 🟩
rust-move-tests 10m 🟩
rust-move-tests 10m 🟩
rust-move-tests 10m 🟩
rust-move-tests 10m 🟩
rust-move-tests 10m 🟩
rust-move-tests 9m 🟩
rust-cargo-deny 9m 🟩🟩🟩🟩🟩
check-dynamic-deps 5m 🟩🟩🟩🟩🟩
general-lints 2m 🟩🟩🟩🟩🟩
semgrep/ci 2m 🟩🟩🟩🟩🟩
file_change_determinator 1m 🟩🟩🟩🟩🟩 (+1 more)
file_change_determinator 57s 🟩🟩🟩🟩🟩
rust-images / rust-all 52s 🟥

settingsfeedbackdocs ⋅ learn more about trunk.io

@bchocho bchocho added the CICD:run-forge-e2e-perf Run the e2e perf forge only label Oct 23, 2024
@bchocho bchocho changed the base branch from main to brian/exec-window-exec-optimize-2 October 23, 2024 06:01

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

@bchocho bchocho added the CICD:build-failpoints-images Build failpoints docker image label Oct 23, 2024

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

@bchocho bchocho force-pushed the brian/exec-window-exec-optimize-2 branch from e395dca to a3feb00 Compare October 30, 2024 18:20
par iter for filter

add a max txns

slighly bump up pipeline pending latency
@bchocho bchocho force-pushed the brian/exec-window-exec-optimize-3 branch from 1dcf9bb to 220dbf5 Compare October 30, 2024 22:01

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

perryjrandall and others added 2 commits October 31, 2024 11:21
Add junit support for forge

This will allow us to onboard forge onto trunk flaky test detection

Also make suite choice more clean

Test Plan: running on PR

This comment has been minimized.

This comment has been minimized.

@bchocho bchocho added the CICD:build-performance-images build performance docker image variants label Oct 31, 2024

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

@bchocho bchocho changed the title brian/exec window exec optimize 3 [execution window] further optimizations Nov 12, 2024

This comment has been minimized.

This comment has been minimized.

@bchocho bchocho force-pushed the brian/exec-window-exec-optimize-3 branch from 2c544f5 to fe3a32b Compare November 13, 2024 17:41

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

Copy link
Contributor

❌ Forge suite realistic_env_max_load failure on 5ff7dce2926f05aa77dd7fe248219d3f2efc6453

two traffics test: inner traffic : committed: 14222.42 txn/s, latency: 2798.29 ms, (p50: 2700 ms, p70: 2700, p90: 3000 ms, p99: 12100 ms), latency samples: 5407680
two traffics test : committed: 99.96 txn/s, latency: 2249.44 ms, (p50: 2200 ms, p70: 2300, p90: 2400 ms, p99: 5900 ms), latency samples: 1720
Latency breakdown for phase 0: ["MempoolToBlockCreation: max: 1.211, avg: 0.753", "ConsensusProposalToOrdered: max: 0.323, avg: 0.290", "ConsensusOrderedToCommit: max: 0.390, avg: 0.377", "ConsensusProposalToCommit: max: 0.676, avg: 0.667"]
Test Failed: check for success

Caused by:
    error!() count in validator logs was 1, and must be 0

Stack backtrace:
   0: anyhow::error::<impl anyhow::Error>::msg
             at /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/anyhow-1.0.89/src/error.rs:85:36
   1: aptos_forge::success_criteria::SuccessCriteriaChecker::check_no_errors::{{closure}}
             at ./testsuite/forge/src/success_criteria.rs:575:13
   2: aptos_forge::success_criteria::SuccessCriteriaChecker::check_for_success::{{closure}}
             at ./testsuite/forge/src/success_criteria.rs:340:50
   3: aptos_forge::interface::network::NetworkContext::check_for_success::{{closure}}
             at ./testsuite/forge/src/interface/network.rs:112:10
   4: <dyn aptos_testcases::NetworkLoadTest as aptos_forge::interface::network::NetworkTest>::run::{{closure}}
             at ./testsuite/testcases/src/lib.rs:326:14
   5: <core::pin::Pin<P> as core::future::future::Future>::poll
             at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/core/src/future/future.rs:123:9
   6: <aptos_testcases::two_traffics_test::TwoTrafficsTest as aptos_forge::interface::network::NetworkTest>::run::{{closure}}
             at ./testsuite/testcases/src/two_traffics_test.rs:77:47
   7: <core::pin::Pin<P> as core::future::future::Future>::poll
             at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/core/src/future/future.rs:123:9
   8: <aptos_testcases::CompositeNetworkTest as aptos_forge::interface::network::NetworkTest>::run::{{closure}}
             at ./testsuite/testcases/src/lib.rs:632:37
   9: <core::pin::Pin<P> as core::future::future::Future>::poll
             at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/core/src/future/future.rs:123:9
  10: tokio::runtime::park::CachedParkThread::block_on::{{closure}}
             at /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.40.0/src/runtime/park.rs:281:63
  11: tokio::runtime::coop::with_budget
             at /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.40.0/src/runtime/coop.rs:107:5
  12: tokio::runtime::coop::budget
             at /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.40.0/src/runtime/coop.rs:73:5
  13: tokio::runtime::park::CachedParkThread::block_on
             at /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.40.0/src/runtime/park.rs:281:31
  14: tokio::runtime::context::blocking::BlockingRegionGuard::block_on
             at /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.40.0/src/runtime/context/blocking.rs:66:9
  15: tokio::runtime::handle::Handle::block_on_inner::{{closure}}
             at /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.40.0/src/runtime/handle.rs:324:22
  16: tokio::runtime::context::runtime::enter_runtime
             at /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.40.0/src/runtime/context/runtime.rs:65:16
  17: tokio::runtime::handle::Handle::block_on_inner
             at /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.40.0/src/runtime/handle.rs:323:9
  18: tokio::runtime::handle::Handle::block_on
             at /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.40.0/src/runtime/handle.rs:302:18
  19: aptos_forge::runner::Forge<F>::run
             at ./testsuite/forge/src/runner.rs:332:50
  20: forge::run_forge_with_changelog
             at ./testsuite/forge-cli/src/main.rs:426:24
  21: forge::main
             at ./testsuite/forge-cli/src/main.rs:329:21
  22: core::ops::function::FnOnce::call_once
             at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/core/src/ops/function.rs:250:5
  23: std::sys_common::backtrace::__rust_begin_short_backtrace
             at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/std/src/sys_common/backtrace.rs:155:18
  24: std::rt::lang_start::{{closure}}
             at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/std/src/rt.rs:166:18
  25: core::ops::function::impls::<impl core::ops::function::FnOnce<A> for &F>::call_once
             at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/core/src/ops/function.rs:284:13
  26: std::panicking::try::do_call
             at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/std/src/panicking.rs:552:40
  27: std::panicking::try
             at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/std/src/panicking.rs:516:19
  28: std::panic::catch_unwind
             at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/std/src/panic.rs:146:14
  29: std::rt::lang_start_internal::{{closure}}
             at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/std/src/rt.rs:148:48
  30: std::panicking::try::do_call
             at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/std/src/panicking.rs:552:40
  31: std::panicking::try
             at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/std/src/panicking.rs:516:19
  32: std::panic::catch_unwind
             at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/std/src/panic.rs:146:14
  33: std::rt::lang_start_internal
             at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/std/src/rt.rs:148:20
  34: main
  35: __libc_start_main
  36: _start
Trailing Log Lines:
  31: std::panicking::try
             at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/std/src/panicking.rs:516:19
  32: std::panic::catch_unwind
             at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/std/src/panic.rs:146:14
  33: std::rt::lang_start_internal
             at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/std/src/rt.rs:148:20
  34: main
  35: __libc_start_main
  36: _start

=== BEGIN JUNIT ===
<?xml version="1.0" encoding="UTF-8"?>
<testsuites name="forge" tests="1" failures="1" errors="0" uuid="714c7570-9ae5-42ab-be34-d178bb1bdd96">
    <testsuite name="local" tests="1" disabled="0" errors="0" failures="1">
        <testcase name="CompositeNetworkTest(CpuChaosWrapper(network:multi-region-network-emulation(two traffics test))) with ">
            <failure message="check for success

Caused by:
    error!() count in validator logs was 1, and must be 0

Stack backtrace:
   0: anyhow::error::&lt;impl anyhow::Error&gt;::msg
             at /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/anyhow-1.0.89/src/error.rs:85:36
   1: aptos_forge::success_criteria::SuccessCriteriaChecker::check_no_errors::{{closure}}
             at ./testsuite/forge/src/success_criteria.rs:575:13
   2: aptos_forge::success_criteria::SuccessCriteriaChecker::check_for_success::{{closure}}
             at ./testsuite/forge/src/success_criteria.rs:340:50
   3: aptos_forge::interface::network::NetworkContext::check_for_success::{{closure}}
             at ./testsuite/forge/src/interface/network.rs:112:10
   4: &lt;dyn aptos_testcases::NetworkLoadTest as aptos_forge::interface::network::NetworkTest&gt;::run::{{closure}}
             at ./testsuite/testcases/src/lib.rs:326:14
   5: &lt;core::pin::Pin&lt;P&gt; as core::future::future::Future&gt;::poll
             at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/core/src/future/future.rs:123:9
   6: &lt;aptos_testcases::two_traffics_test::TwoTrafficsTest as aptos_forge::interface::network::NetworkTest&gt;::run::{{closure}}
             at ./testsuite/testcases/src/two_traffics_test.rs:77:47
   7: &lt;core::pin::Pin&lt;P&gt; as core::future::future::Future&gt;::poll
             at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/core/src/future/future.rs:123:9
   8: &lt;aptos_testcases::CompositeNetworkTest as aptos_forge::interface::network::NetworkTest&gt;::run::{{closure}}
             at ./testsuite/testcases/src/lib.rs:632:37
   9: &lt;core::pin::Pin&lt;P&gt; as core::future::future::Future&gt;::poll
             at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/core/src/future/future.rs:123:9
  10: tokio::runtime::park::CachedParkThread::block_on::{{closure}}
             at /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.40.0/src/runtime/park.rs:281:63
  11: tokio::runtime::coop::with_budget
             at /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.40.0/src/runtime/coop.rs:107:5
  12: tokio::runtime::coop::budget
             at /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.40.0/src/runtime/coop.rs:73:5
  13: tokio::runtime::park::CachedParkThread::block_on
             at /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.40.0/src/runtime/park.rs:281:31
  14: tokio::runtime::context::blocking::BlockingRegionGuard::block_on
             at /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.40.0/src/runtime/context/blocking.rs:66:9
  15: tokio::runtime::handle::Handle::block_on_inner::{{closure}}
             at /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.40.0/src/runtime/handle.rs:324:22
  16: tokio::runtime::context::runtime::enter_runtime
             at /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.40.0/src/runtime/context/runtime.rs:65:16
  17: tokio::runtime::handle::Handle::block_on_inner
             at /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.40.0/src/runtime/handle.rs:323:9
  18: tokio::runtime::handle::Handle::block_on
             at /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.40.0/src/runtime/handle.rs:302:18
  19: aptos_forge::runner::Forge&lt;F&gt;::run
             at ./testsuite/forge/src/runner.rs:332:50
  20: forge::run_forge_with_changelog
             at ./testsuite/forge-cli/src/main.rs:426:24
  21: forge::main
             at ./testsuite/forge-cli/src/main.rs:329:21
  22: core::ops::function::FnOnce::call_once
             at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/core/src/ops/function.rs:250:5
  23: std::sys_common::backtrace::__rust_begin_short_backtrace
             at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/std/src/sys_common/backtrace.rs:155:18
  24: std::rt::lang_start::{{closure}}
             at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/std/src/rt.rs:166:18
  25: core::ops::function::impls::&lt;impl core::ops::function::FnOnce&lt;A&gt; for &amp;F&gt;::call_once
             at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/core/src/ops/function.rs:284:13
  26: std::panicking::try::do_call
             at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/std/src/panicking.rs:552:40
  27: std::panicking::try
             at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/std/src/panicking.rs:516:19
  28: std::panic::catch_unwind
             at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/std/src/panic.rs:146:14
  29: std::rt::lang_start_internal::{{closure}}
             at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/std/src/rt.rs:148:48
  30: std::panicking::try::do_call
             at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/std/src/panicking.rs:552:40
  31: std::panicking::try
             at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/std/src/panicking.rs:516:19
  32: std::panic::catch_unwind
             at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/std/src/panic.rs:146:14
  33: std::rt::lang_start_internal
             at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/std/src/rt.rs:148:20
  34: main
  35: __libc_start_main
  36: _start"/>
        </testcase>
    </testsuite>
</testsuites>
=== END JUNIT ===

Swarm logs can be found here: See fgi output for more information.
{"level":"INFO","source":{"package":"aptos_forge","file":"testsuite/forge/src/backend/k8s/cluster_helper.rs:381"},"thread_name":"main","hostname":"forge-e2e-pr-15053-1731553846-5ff7dce2926f05aa77dd7fe248219d3f2","timestamp":"2024-11-14T03:23:46.403274Z","message":"Deleting namespace forge-e2e-pr-15053: Some(NamespaceStatus { conditions: None, phase: Some(\"Terminating\") })"}
{"level":"INFO","source":{"package":"aptos_forge","file":"testsuite/forge/src/backend/k8s/cluster_helper.rs:398"},"thread_name":"main","hostname":"forge-e2e-pr-15053-1731553846-5ff7dce2926f05aa77dd7fe248219d3f2","timestamp":"2024-11-14T03:23:46.403294Z","message":"aptos-node resources for Forge removed in namespace: forge-e2e-pr-15053"}

failures:
    CompositeNetworkTest

test result: FAILED. 0 passed; 1 failed; 0 filtered out

Failed to run tests:
Tests Failed
Error: Tests Failed

Stack backtrace:
   0: anyhow::error::<impl anyhow::Error>::msg
             at /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/anyhow-1.0.89/src/error.rs:85:36
   1: aptos_forge::runner::Forge<F>::run
             at ./testsuite/forge/src/runner.rs:358:13
   2: forge::run_forge_with_changelog
             at ./testsuite/forge-cli/src/main.rs:426:24
   3: forge::main
             at ./testsuite/forge-cli/src/main.rs:329:21
   4: core::ops::function::FnOnce::call_once
             at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/core/src/ops/function.rs:250:5
   5: std::sys_common::backtrace::__rust_begin_short_backtrace
             at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/std/src/sys_common/backtrace.rs:155:18
   6: std::rt::lang_start::{{closure}}
             at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/std/src/rt.rs:166:18
   7: core::ops::function::impls::<impl core::ops::function::FnOnce<A> for &F>::call_once
             at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/core/src/ops/function.rs:284:13
   8: std::panicking::try::do_call
             at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/std/src/panicking.rs:552:40
   9: std::panicking::try
             at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/std/src/panicking.rs:516:19
  10: std::panic::catch_unwind
             at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/std/src/panic.rs:146:14
  11: std::rt::lang_start_internal::{{closure}}
             at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/std/src/rt.rs:148:48
  12: std::panicking::try::do_call
             at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/std/src/panicking.rs:552:40
  13: std::panicking::try
             at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/std/src/panicking.rs:516:19
  14: std::panic::catch_unwind
             at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/std/src/panic.rs:146:14
  15: std::rt::lang_start_internal
             at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/std/src/rt.rs:148:20
  16: main
  17: __libc_start_main
  18: _start
Debugging output:
NAME                                    READY   STATUS      RESTARTS   AGE
aptos-node-0-fullnode-eforge187-0       1/1     Running     0          12m
aptos-node-0-validator-0                1/1     Running     0          12m
aptos-node-1-fullnode-eforge187-0       1/1     Running     0          12m
aptos-node-1-validator-0                1/1     Running     0          12m
aptos-node-2-fullnode-eforge187-0       1/1     Running     0          12m
aptos-node-2-validator-0                1/1     Running     0          12m
aptos-node-3-fullnode-eforge187-0       1/1     Running     0          12m
aptos-node-3-validator-0                1/1     Running     0          12m
aptos-node-4-fullnode-eforge187-0       1/1     Running     0          12m
aptos-node-4-validator-0                1/1     Running     0          12m
aptos-node-5-validator-0                1/1     Running     0          12m
aptos-node-6-validator-0                1/1     Running     0          12m
forge-testnet-deployer-nm628            0/1     Completed   0          12m
genesis-aptos-genesis-eforge187-v6742   0/1     Completed   0          12m

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CICD:run-forge-e2e-perf Run the e2e perf forge only
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants