[CI] Fix inaccurate Codecov reports due to `gotestsum` retries #21954

amenasria · 2024-01-09T13:34:01Z

What does this PR do?

This PR modifies the command gotestsum uses to run the unit tests retries leveraging the --raw-command flag (see doc) following the workaround proposed in gotestyourself/gotestsum#274.

Currently when unit tests are rerun, the coverage file coverage.out gets overwritten. The new one is created from only running a flaky unit tests subset making the coverage way smaller than it really is.

The --raw-command flag is used to run go test as well as before but targeting a different coverage.out.rerun file for each rerun.

Motivation

Inaccurate reports in Codecov.

Additional Notes

Reproduce the error

Add a failing test in pkg/collector/check/stats/stats_test.go:

import (
    ...
  	"time"
    ...
)
...
func TestCodecovFail(t *testing.T) {
	time.Sleep(2 * time.Second)
	t.Log("Failing after 2 seconds sleep !")
	assert.True(t, false)
}

Run the tests for the ./pkg/collector/check/stats module without reruns:

inv test --rerun-fails=0 --targets=./pkg/collector/check/stats --coverage

Your coverage file coverage.out should show a pretty filled coverage (scroll to the right to get the coverage percentages):

➜  go tool cover -func=coverage.out
github.com/DataDog/datadog-agent/pkg/collector/check/stats/stats.go:64:         NewSenderStats                          0.0%
github.com/DataDog/datadog-agent/pkg/collector/check/stats/stats.go:71:         Copy                                    0.0%
github.com/DataDog/datadog-agent/pkg/collector/check/stats/stats.go:122:        NewStats                                100.0%
github.com/DataDog/datadog-agent/pkg/collector/check/stats/stats.go:144:        Add                                     0.0%
github.com/DataDog/datadog-agent/pkg/collector/check/stats/stats.go:234:        translateEventTypes                     88.9%
github.com/DataDog/datadog-agent/pkg/collector/check/stats/stats.go:251:        TranslateEventPlatformEventTypes        90.0%
total:                                                                          (statements)                            25.3%

Now if you run the same tests with reruns:

inv test --rerun-fails=2 --targets=./pkg/collector/check/stats --coverage

Your coverage file coverage.out now shows an empty coverage:

➜  go tool cover -func=coverage.out                                                   
github.com/DataDog/datadog-agent/pkg/collector/check/stats/stats.go:64:         NewSenderStats                          0.0%
github.com/DataDog/datadog-agent/pkg/collector/check/stats/stats.go:71:         Copy                                    0.0%
github.com/DataDog/datadog-agent/pkg/collector/check/stats/stats.go:122:        NewStats                                0.0%
github.com/DataDog/datadog-agent/pkg/collector/check/stats/stats.go:144:        Add                                     0.0%
github.com/DataDog/datadog-agent/pkg/collector/check/stats/stats.go:234:        translateEventTypes                     0.0%
github.com/DataDog/datadog-agent/pkg/collector/check/stats/stats.go:251:        TranslateEventPlatformEventTypes        0.0%
total:                                                                          (statements)                            0.0%

Possible Drawbacks / Trade-offs

Describe how to test/QA your changes

Checkout the branch.
Add a failing test in pkg/collector/check/stats/stats_test.go like the Reproduce the error section.
Now with or without reruns the coverage.out file should be the same:

inv test --rerun-fails=2 --targets=./pkg/collector/check/stats --coverage

inv test --rerun-fails=0 --targets=./pkg/collector/check/stats --coverage

➜  go tool cover -func=coverage.out                                         
github.com/DataDog/datadog-agent/pkg/collector/check/stats/stats.go:70:         NewSenderStats                          0.0%
github.com/DataDog/datadog-agent/pkg/collector/check/stats/stats.go:77:         Copy                                    0.0%
github.com/DataDog/datadog-agent/pkg/collector/check/stats/stats.go:133:        NewStats                                100.0%
github.com/DataDog/datadog-agent/pkg/collector/check/stats/stats.go:156:        Add                                     0.0%
github.com/DataDog/datadog-agent/pkg/collector/check/stats/stats.go:251:        translateEventTypes                     88.9%
github.com/DataDog/datadog-agent/pkg/collector/check/stats/stats.go:268:        TranslateEventPlatformEventTypes        90.0%
github.com/DataDog/datadog-agent/pkg/collector/check/stats/util.go:10:          calculateCheckDelay                     100.0%
total:                                                                          (statements)                            30.6%

Reviewer's Checklist

tasks/go_test.py

Co-authored-by: Nicolas Schweitzer <[email protected]>

tasks/go_test.py

pr-commenter · 2024-01-26T12:20:53Z

Bloop Bleep... Dogbot Here

Regression Detector Results

Run ID: bf786cf9-aca6-4ada-ab2c-24beeafc4650
Baseline: 85bcb55
Comparison: ceb1471
Total CPUs: 7

Performance changes are noted in the perf column of each table:

✅ = significantly better comparison variant performance
❌ = significantly worse comparison variant performance
➖ = no significant change in performance

No significant changes in experiment optimization goals

Confidence level: 90.00%
Effect size tolerance: |Δ mean %| ≥ 5.00%

There were no significant changes in experiment optimization goals at this confidence level and effect size tolerance.

Experiments ignored for regressions

Regressions in experiments with settings containing erratic: true are ignored.

perf	experiment	goal	Δ mean %	Δ mean % CI
➖	file_to_blackhole	% cpu utilization	+1.16	[-5.46, +7.79]
➖	file_tree	memory utilization	+0.40	[+0.35, +0.46]
➖	idle	memory utilization	-0.21	[-0.24, -0.18]

Fine details of change detection per experiment

perf	experiment	goal	Δ mean %	Δ mean % CI
➖	file_to_blackhole	% cpu utilization	+1.16	[-5.46, +7.79]
➖	file_tree	memory utilization	+0.40	[+0.35, +0.46]
➖	tcp_syslog_to_blackhole	ingress throughput	+0.14	[+0.09, +0.19]
➖	trace_agent_msgpack	ingress throughput	+0.07	[+0.05, +0.08]
➖	trace_agent_json	ingress throughput	+0.01	[-0.02, +0.04]
➖	uds_dogstatsd_to_api	ingress throughput	+0.00	[-0.03, +0.03]
➖	tcp_dd_logs_filter_exclude	ingress throughput	+0.00	[-0.06, +0.06]
➖	idle	memory utilization	-0.21	[-0.24, -0.18]
➖	process_agent_standard_check_with_stats	memory utilization	-0.22	[-0.26, -0.18]
➖	process_agent_real_time_mode	memory utilization	-0.23	[-0.26, -0.20]
➖	process_agent_standard_check	memory utilization	-0.41	[-0.46, -0.36]
➖	uds_dogstatsd_to_api_cpu	% cpu utilization	-1.47	[-2.88, -0.05]
➖	otel_to_otel_logs	ingress throughput	-1.47	[-2.20, -0.75]

Explanation

A regression test is an A/B test of target performance in a repeatable rig, where "performance" is measured as "comparison variant minus baseline variant" for an optimization goal (e.g., ingress throughput). Due to intrinsic variability in measuring that goal, we can only estimate its mean value for each experiment; we report uncertainty in that value as a 90.00% confidence interval denoted "Δ mean % CI".

For each experiment, we decide whether a change in performance is a "regression" -- a change worth investigating further -- if all of the following criteria are true:

Its estimated |Δ mean %| ≥ 5.00%, indicating the change is big enough to merit a closer look.
Its 90.00% confidence interval "Δ mean % CI" does not contain zero, indicating that if our statistical model is accurate, there is at least a 90.00% chance there is a difference in performance between baseline and comparison variants.
Its configuration does not mark it "erratic".

amenasria · 2024-01-26T14:58:46Z

/merge

dd-devflow · 2024-01-26T14:58:51Z

🚂 MergeQueue

Pull request added to the queue.

This build is next! (estimated merge in less than 47m)

Use /merge -c to cancel this operation!

dd-devflow · 2024-01-26T15:43:32Z

❌ MergeQueue

Tests failed on this commit 3c28d24

You should fix those tests and then re-add your pull request to the queue!

Details

checks are failing:

windows-unit-tests

If you need support, contact us on Slack #ci-interfaces with those details!

amenasria · 2024-01-26T15:50:33Z

/merge

dd-devflow · 2024-01-26T15:50:39Z

🚂 MergeQueue

Pull request added to the queue.

There are 3 builds ahead! (estimated merge in less than 1h)

Use /merge -c to cancel this operation!

Alexandre Menasria added 5 commits January 8, 2024 14:15

[CI/Tooling] Refactor gotestsum flags

5586353

Missing --

43ca303

Add gocovmerge to the test go dependencies

b7e8ea4

Cleaner junit_file_flag definition

78dd58c

Codecov workaround

dd9cef6

amenasria added changelog/no-changelog [deprecated] team/agent-platform [deprecated] qa/skip-qa - use other qa/ labels [DEPRECATED] Please use qa/done or qa/no-code-change to skip creating a QA card labels Jan 9, 2024

amenasria added this to the 7.52.0 milestone Jan 9, 2024

amenasria requested a review from a team as a code owner January 9, 2024 13:34

Remove files using pure Python to avoid OS conflicts

c62b4cd

chouetz approved these changes Jan 9, 2024

View reviewed changes

tasks/go_test.py Outdated Show resolved Hide resolved

tasks/go_test.py Outdated Show resolved Hide resolved

amenasria changed the base branch from amenasria/refactor-gotestsum-flags to main January 18, 2024 14:06

Alexandre Menasria and others added 2 commits January 18, 2024 17:58

Merge branch 'main' into amenasria/fix-codecov-inaccurate

2ea0394

Update tasks/go_test.py

415fb63

Co-authored-by: Nicolas Schweitzer <[email protected]>

KevinFairise2 reviewed Jan 19, 2024

View reviewed changes

tasks/go_test.py Outdated Show resolved Hide resolved

Alexandre Menasria added 3 commits January 19, 2024 14:24

Use the coverage script only if coverage is set to true

5170c86

Remove coverage flag for the e2e go test invoke task

2d564cb

Suppor windows

22b16ed

amenasria requested a review from a team as a code owner January 19, 2024 13:55

Alexandre Menasria added 2 commits January 22, 2024 10:56

Clean deletion

a848e1f

Use the --coverage flag

5a73174

amenasria requested a review from a team as a code owner January 22, 2024 10:08

julien-lebot approved these changes Jan 22, 2024

View reviewed changes

Alexandre Menasria added 4 commits January 22, 2024 12:20

Add failsafe and run gocovmerge only when tests were run

f390272

Merge branch 'main' into amenasria/fix-codecov-inaccurate

f68fadc

Failsafe file deletion

6763d71

Fix typo

1310af6

amenasria added the qa/no-code-change No code change in Agent code requiring validation label Jan 22, 2024

Else missing

19ebc45

Alexandre Menasria added 7 commits January 25, 2024 22:58

Forgot one coverprofile

6af6027

Proper bat file

d3a51d7

Made dumb mistakes but this is on a whole other level

7c01b5b

Fix broken path

5ab5930

CodecovWorkaround class (code so clean you could eat on it)

5b30c7c

[skip cancel] Revert 27fb1c5

afa4faa

inv tidy-all

7c842e9

Add gocovmerge to internal/tools

ceb1471

amenasria removed the request for review from a team January 26, 2024 13:27

KevinFairise2 approved these changes Jan 26, 2024

View reviewed changes

dd-devflow bot added mergequeue-status: queued mergequeue-status: in_progress mergequeue-status: rejected and removed mergequeue-status: queued mergequeue-status: in_progress labels Jan 26, 2024

dd-devflow bot added mergequeue-status: queued mergequeue-status: in_progress and removed mergequeue-status: rejected mergequeue-status: queued labels Jan 26, 2024

dd-mergequeue bot merged commit f600d81 into main Jan 26, 2024

dd-mergequeue bot deleted the amenasria/fix-codecov-inaccurate branch January 26, 2024 19:58

dd-devflow bot added mergequeue-status: done and removed mergequeue-status: in_progress labels Jan 26, 2024

amenasria mentioned this pull request Feb 26, 2024

[CI] Fix unit tests duplication generated by the Codecov workaround #23141

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[CI] Fix inaccurate Codecov reports due to `gotestsum` retries #21954

[CI] Fix inaccurate Codecov reports due to `gotestsum` retries #21954

Uh oh!

amenasria commented Jan 9, 2024

Uh oh!

Uh oh!

Uh oh!

Uh oh!

pr-commenter bot commented Jan 26, 2024 •

edited

Loading

Experiments ignored for regressions

Fine details of change detection per experiment

Explanation

Uh oh!

amenasria commented Jan 26, 2024

Uh oh!

dd-devflow bot commented Jan 26, 2024

Uh oh!

dd-devflow bot commented Jan 26, 2024

Uh oh!

amenasria commented Jan 26, 2024

Uh oh!

dd-devflow bot commented Jan 26, 2024

Uh oh!

Uh oh!

[CI] Fix inaccurate Codecov reports due to gotestsum retries #21954

[CI] Fix inaccurate Codecov reports due to gotestsum retries #21954

Uh oh!

Conversation

amenasria commented Jan 9, 2024

What does this PR do?

Motivation

Additional Notes

Reproduce the error

Possible Drawbacks / Trade-offs

Describe how to test/QA your changes

Reviewer's Checklist

Uh oh!

Uh oh!

Uh oh!

Uh oh!

pr-commenter bot commented Jan 26, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Bloop Bleep... Dogbot Here

Regression Detector Results

No significant changes in experiment optimization goals

Experiments ignored for regressions

Fine details of change detection per experiment

Explanation

Uh oh!

amenasria commented Jan 26, 2024

Uh oh!

dd-devflow bot commented Jan 26, 2024

Uh oh!

dd-devflow bot commented Jan 26, 2024

Uh oh!

amenasria commented Jan 26, 2024

Uh oh!

dd-devflow bot commented Jan 26, 2024

Uh oh!

Uh oh!

[CI] Fix inaccurate Codecov reports due to `gotestsum` retries #21954

[CI] Fix inaccurate Codecov reports due to `gotestsum` retries #21954

pr-commenter bot commented Jan 26, 2024 •

edited

Loading