Skip to content

Conversation

@aymericDD
Copy link
Contributor

What does this PR do?

  • Adds new functionality
  • Alters existing functionality
  • Fixes a bug
  • Improves documentation or testing

Please briefly describe your changes as well as the motivation behind them:

  • Removes the dry run early exit from the bpftool executor in ebpf/executor.go:44
  • The bpftool feature probe command is read-only and safe to execute in dry run mode
  • This fix allows proper eBPF capability validation even when dry run is enabled, preventing false failures when testing disk failure disruptions at node level
  • Adds a new unit test case in injector/disk_failure_test.go:273 to verify that eBPF validation still occurs correctly in dry run mode

Code Quality Checklist

  • The documentation is up to date.
  • My code is sufficiently commented and passes continuous integration checks.
  • I have signed my commit (see Contributing Docs).

Testing

  • I leveraged continuous integration testing
    • by depending on existing unit tests or end-to-end tests.
    • by adding new unit tests or end-to-end tests.
  • I manually tested the following steps:
    • N/A - The fix is covered by the new unit test that verifies ValidateRequiredSystemConfig and GetMapTypes are properly called in dry run mode
    • locally.
    • as a canary deployment to a cluster.

Removes dry run early exit from bpftool executor since
'bpftool feature probe' is read-only and safe to execute.
This allows proper eBPF capability validation even in dry
run mode, preventing false failures when testing disk
failure disruptions at node level.

Jira: CHAOSPLT-1326
@aymericDD aymericDD requested a review from a team as a code owner December 4, 2025 09:29
@datadog-datadog-prod-us1
Copy link

datadog-datadog-prod-us1 bot commented Dec 4, 2025

⚠️ Tests

⚠️ Warnings

❄️ 1 New flaky test detected

[It] Disruption Controller Cloud disruption is a host disruption disguised should create a cloud disruption but apply a host disruption with the list of cloud managed service ip ranges from Controller Suite (Datadog)
runtime error: index out of range [0] with length 0

[PANICKED] Test Panicked
In [It] at: /home/circleci/go/src/github.com/DataDog/chaos-controller/vendor/github.com/onsi/gomega/internal/async_assertion.go:333 @ 12/05/25 11:01:39.222

runtime error: index out of range [0] with length 0

Full Stack Trace
  github.com/onsi/gomega/internal.(*AsyncAssertion).buildActualPoller.func3.1()
  	/home/circleci/go/src/github.com/DataDog/chaos-controller/vendor/github.com/onsi/gomega/internal/async_assertion.go:333 +0x325
...

ℹ️ Info

🧪 All tests passed

This comment will be updated automatically if new data arrives.
🔗 Commit SHA: bac1fd4 | Docs | Datadog PR Page | Was this helpful? Give us feedback!

e.log.Debugf("running bpftool command: %v", cmd.String())

// early exit if dry-run mode is enabled
if e.dryRun {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what's the effect of the dryRun when it injects? Why did we previously exit when it was in dryRun?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what's the effect of the dryRun when it injects?

FWIK The dry-run will only execute read-only commands and will not apply the injection for real.

Why did we previously exit when it was in dryRun?

It was a long time ago, so I don't remember exactly the why... However, pbftool is only used to verify if the kernel can support eBPF programs so it makes sense to execute the command in dry-run mode to ensure the node is ready to support the disruption.
Before the fix, when the disruption is in dry run mode, the run command returns an empty string and the unmarshal command returns an error.

@aymericDD aymericDD requested a review from clairecng December 4, 2025 17:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants