Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fixed race condition in tests assuming TEST_EVENT_OBSERVER_SKIP_RETRY… #5669

Open
wants to merge 2 commits into
base: develop
Choose a base branch
from

Conversation

rdeioris
Copy link
Contributor

@rdeioris rdeioris commented Jan 8, 2025

Description

This patch fixes a race condition in the event_dispatcher tests, specifically:

event_dispatcher::test::test_process_pending_payloads
event_dispatcher::test::test_send_payload_timeout
event_dispatcher::test::test_send_payload_with_db

When executed in parallel, the TEST_EVENT_OBSERVER_SKIP_RETRY global mutex is used for enabling/disabling the retry system for events (by storing them in the db for eventually retry their sending to the observer).

This global is only used in the tests and assumed to be off. The problem is for those tests that are executed while TEST_EVENT_OBSERVER_SKIP_RETRY is set to true by another test and do not lock it before reading.

The patch simply enforces locking (where missing) before sending the payload in those tests.

Applicable issues

Additional info (benefits, drawbacks, caveats)

This originally was part (in a more complex form) of #5570 .

Checklist

  • Test coverage for new or modified code paths
  • Changelog is updated
  • Required documentation changes (e.g., docs/rpc/openapi.yaml and rpc-endpoints.md for v2 endpoints, event-dispatcher.md for new events)
  • New clarity functions have corresponding PR in clarity-benchmarking repo
  • New integration test(s) added to bitcoin-tests.yml

@rdeioris rdeioris requested a review from a team as a code owner January 8, 2025 12:53
@aldur aldur requested review from obycode and jbencin January 10, 2025 15:44
@obycode
Copy link
Contributor

obycode commented Jan 10, 2025

Ah, interesting. I hadn't considered this. Doesn't this change still have the potential for flakiness though? Do we need to run these tests with --test-threads=1 instead?

Copy link
Contributor

@jbencin jbencin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So these tests are running in different threads in the same memory space?

If that's the case, I agree with Brice, doing this could just cause falkeyness in the test setting TEST_EVENT_OBSERVER_SKIP_RETRY to true

Seems like you'd need a structure like this:

static TEST_EVENT_OBSERVER_SKIP_RETRY: std::sync::Mutex<HashMap<TestId, bool>> = std::sync::Mutex::new(HashMap::new());

To keep track of which test set the variable, so none of them interfere with eachother

@rdeioris
Copy link
Contributor Author

So these tests are running in different threads in the same memory space?

If that's the case, I agree with Brice, doing this could just cause falkeyness in the test setting TEST_EVENT_OBSERVER_SKIP_RETRY to true

Seems like you'd need a structure like this:

static TEST_EVENT_OBSERVER_SKIP_RETRY: std::sync::Mutex<HashMap<TestId, bool>> = std::sync::Mutex::new(HashMap::new());

To keep track of which test set the variable, so none of them interfere with eachother

The point of the patch is to simplify the previous attempt in #5570 where i used thread locals. Using a hashmap for this IMHO seems overkill. As this pattern (having a global lazy static for hijacking test-specific behaviours) is pretty common in the codebase, maybe we should agree on a "blessed" approach for it (and probably the elephant in the room is that we should avoid it to reduce the amount of test-only code that diverges from the base codepath).

@rdeioris
Copy link
Contributor Author

Ah, interesting. I hadn't considered this. Doesn't this change still have the potential for flakiness though? Do we need to run these tests with --test-threads=1 instead?

Actually there are very few parts of the code where this specific logic applies and are mostly test-specific code. Before the patch i used to run them single threaded to make them pass but i think it is worthy to support the default rust test behaviour.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants