Skip to content

[Accton][wedge800cact] Fix VerifyCallbacksOnMacEntryChange warmboot timeout#1303

Open
BrandonCheng0121 wants to merge 1 commit into
facebook:mainfrom
BrandonCheng0121:fix_L2EntryChange_timout
Open

[Accton][wedge800cact] Fix VerifyCallbacksOnMacEntryChange warmboot timeout#1303
BrandonCheng0121 wants to merge 1 commit into
facebook:mainfrom
BrandonCheng0121:fix_L2EntryChange_timout

Conversation

@BrandonCheng0121

Copy link
Copy Markdown
Contributor

After warmboot, the flood-prevention port may remain DOWN (preserved from coldboot state) or may be in a transient state due to link flap. Calling bringDownPort directly on an already-DOWN port causes LinkStateToggler to wait indefinitely because SDK won't generate a new DOWN notification when state hasn't changed.

Add bringDownPortIfUp() helper that polls for the port to come UP (10s timeout, 100ms interval) before calling bringDownPort. If the port stays DOWN (already in desired state), skip the bringDown call and proceed with the test.

Pre-submission checklist

  • I've ran the linters locally and fixed lint errors related to the files I modified in this PR. You can install the linters by running pip install -r requirements-dev.txt && pre-commit install
  • pre-commit run --files fboss/agent/test/agent_hw_tests/AgentMacLearningTests.cpp

Summary

Before this fix, the warm-boot test would typically encounter a TIMEOUT issue within 40 consecutive runs.
With this fix, the test consistently passes for 100 consecutive runs.

Test Plan

Test command:
for i in {1..100}; do echo "=== run $i attempts ==="; time ./bin/run_test.py sai_agent --agent-run-mode mono --filter=AgentMacSwLearningModeTest.VerifyCallbacksOnMacEntryChange --skip-known-bad-tests "leaba/25.11.4210/25.11.4210/graphene202x" --enable-production-features g202x --config /opt/fboss/share/hw_test_configs/wedge800cact.agent.materialized_JSON --fruid-path /home/Go_FBOSS_Test/W800CA-Fix/./fboss-configs/fboss/oss/scripts/run_configs/fruid.json --mgmt-if eth0 --platform_mapping_override_path /home/Go_FBOSS_Test/W800CA-Fix/./fboss-configs/fboss/lib/platform_mapping_v2/generated_platform_mappings/wedge800cact_platform_mapping-2026-0418-v0.7-honglim_20260409-del_pie.json ; done

The test consistently passes for 100 consecutive runs.

…imeout

After warmboot, the flood-prevention port may remain DOWN (preserved
from coldboot state) or may be in a transient state due to link flap.
Calling bringDownPort directly on an already-DOWN port causes
LinkStateToggler to wait indefinitely because SDK won't generate a new
DOWN notification when state hasn't changed.

Add bringDownPortIfUp() helper that polls for the port to come UP
(10s timeout, 100ms interval) before calling bringDownPort. If the
port stays DOWN (already in desired state), skip the bringDown call
and proceed with the test.
@BrandonCheng0121 BrandonCheng0121 requested a review from a team as a code owner June 16, 2026 07:27
@meta-cla meta-cla Bot added the CLA Signed label Jun 16, 2026
@BrandonCheng0121 BrandonCheng0121 marked this pull request as draft June 16, 2026 07:32
@BrandonCheng0121 BrandonCheng0121 marked this pull request as ready for review June 16, 2026 08:00
@BrandonCheng0121

Copy link
Copy Markdown
Contributor Author

@Tianyu-Meta
Please take a look at this PR, thanks.

@BrandonCheng0121

BrandonCheng0121 commented Jun 17, 2026

Copy link
Copy Markdown
Contributor Author

@Tianyu-Meta

I suspect this is a race condition: if the bringDownPort operation is triggered before the warm-boot initialization completely finishes, it leads to unexpected behavior.
To resolve this, I propose two potential solutions:

  1. Wait for the warm-boot port state initialization to complete before executing bringDownPort on the target port.(This is used by this PR.)

  2. Alternatively, we could provide a dedicated initialConfig for this test case. However, according to the FBOSS coding standards (fboss/skills/fboss-code-standards/references/testing-patterns.md), this solution does not seem to be allowed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant