-
Notifications
You must be signed in to change notification settings - Fork 418
[no-relnote] Add E2E for containerd #1313
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR adds comprehensive E2E testing for containerd runtime configuration alongside existing Docker testing infrastructure. The changes introduce a nested container testing framework that allows running tests inside containers to validate NVIDIA Container Toolkit behavior in containerized environments.
- Adds new E2E tests for containerd drop-in configuration functionality
- Introduces nvidia-cdi-refresh systemd unit testing
- Implements nested container runner infrastructure for isolated testing
Reviewed Changes
Copilot reviewed 9 out of 32 changed files in this pull request and generated 1 comment.
Show a summary per file
File | Description |
---|---|
tests/go.mod | Adds new dependencies for UUID generation and test utilities |
tests/e2e/runner.go | Implements nested container runner with Docker installation and CTK setup |
tests/e2e/nvidia-ctk_containerd_test.go | New comprehensive containerd E2E test suite |
tests/e2e/nvidia-ctk_docker_test.go | Refactors to use shared runner infrastructure and fixes macOS compatibility |
tests/e2e/nvidia-cdi-refresh_test.go | New systemd unit tests for CDI refresh functionality |
tests/e2e/nvidia-container-cli_test.go | Refactors to use nested container runner |
tests/e2e/installer.go | Adds containerd installation template and additional flags support |
tests/e2e/e2e_test.go | Centralizes test runner initialization in BeforeSuite |
tests/e2e/Makefile | Documents new test categories |
Pull Request Test Coverage Report for Build 18005738357Details
💛 - Coveralls |
5c85056
to
3dc480c
Compare
I'll mark this PR as ready for review once #1235 is merged |
3dc480c
to
563a192
Compare
Rebased |
029af03
to
1899001
Compare
51ad031
to
c65e468
Compare
Rebased |
AfterAll(func(ctx context.Context) { | ||
// Cleanup: remove the container and the temporary script on the host. | ||
// Use || true to ensure cleanup doesn't fail the test | ||
runner.Run(fmt.Sprintf("docker rm -f %s 2>/dev/null || true", containerName)) //nolint:errcheck |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of the nolint let's just drop the return values.
runner.Run(fmt.Sprintf("docker rm -f %s 2>/dev/null || true", containerName)) //nolint:errcheck | |
_, _, _ = runner.Run(fmt.Sprintf("docker rm -f %s 2>/dev/null || true", containerName)) |
Does it mak sense to at least WARN if the cleanup fails? The || true
doesn't ensure that the test doesn't fail, the fact that we don't check the return value does that.
# Remove any imports line from the config (reset to original state) | ||
if [ -f /etc/containerd/config.toml ]; then | ||
grep -v "^imports = " /etc/containerd/config.toml > /tmp/config.toml.tmp && mv /tmp/config.toml.tmp /etc/containerd/config.toml || true | ||
fi |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not just make a copy of the original config and restore that after / before each test?
|
||
# Restart containerd to pick up the clean config | ||
systemctl restart containerd | ||
sleep 2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a way to check containerd health?
output, _, err := nestedContainerRunner.Run(`cat /etc/containerd/conf.d/99-nvidia.toml`) | ||
Expect(err).ToNot(HaveOccurred()) | ||
Expect(output).To(ContainSubstring(`nvidia`)) | ||
Expect(output).To(ContainSubstring(`nvidia-cdi`)) | ||
Expect(output).To(ContainSubstring(`nvidia-legacy`)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As I mentioned in person, we are nolonger triggering the configuration of containerd with the current installation mechanism.
output, _, err = nestedContainerRunner.Run(`containerd config dump`) | ||
Expect(err).ToNot(HaveOccurred()) | ||
// Verify imports section is in the merged config | ||
Expect(output).To(ContainSubstring(`imports = ['/etc/containerd/conf.d/*.toml']`)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I actually think that config dump
prints that ACTUAL paths of all files processed.
ContainSubstring(`default_runtime_name = "nvidia"`), | ||
ContainSubstring(`default_runtime_name = 'nvidia'`), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These should definitely be VERSION specific checks.
ContainSubstring(`default_runtime_name = "nvidia"`), | ||
ContainSubstring(`default_runtime_name = 'nvidia'`), | ||
)) | ||
Expect(output).To(ContainSubstring(`enable_cdi = true`)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where do we toggle this behaviour? It is disabled by default.
ContainSubstring(`[plugins."io.containerd.grpc.v1.cri".containerd.runtimes.nvidia]`), | ||
ContainSubstring(`[plugins.'io.containerd.cri.v1.runtime'.containerd.runtimes.nvidia]`), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Once again, thsi should be version-specific.
}) | ||
|
||
When("containerd already has a custom default runtime configured", func() { | ||
It("should preserve the existing default runtime when --set-as-default=false is specified", func(ctx context.Context) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
--set-as-default=false
is not specified. It is the default.
`) | ||
Expect(err).ToNot(HaveOccurred()) | ||
|
||
// Configure containerd with drop-in config (explicitly not setting as default) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How are we configuring with the drop-in config in this case?
}) | ||
|
||
When("containerd has multiple custom runtimes and plugins configured", func() { | ||
It("should add NVIDIA runtime alongside existing runtimes like kata", func(ctx context.Context) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As a matter of interest, how is this different to an arbitrarry "custom" runtime?
Expect(err).ToNot(HaveOccurred()) | ||
|
||
// Verify kata runtime was added | ||
output, _, err := nestedContainerRunner.Run(`systemctl restart containerd && sleep 2 && containerd config dump | grep -A5 kata`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why the -A5
? Also. Please split the different steps.
Expect(err).ToNot(HaveOccurred()) | ||
Expect(output).To(ContainSubstring(`kata`)) | ||
|
||
// Configure containerd with drop-in config and set NVIDIA as default |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where are we setting these options?
_, _, err := nestedContainerRunner.Run(` | ||
rm -f /etc/containerd/config.toml | ||
rm -rf /etc/containerd/conf.d | ||
mkdir -p /etc/containerd/conf.d | ||
|
||
cat > /etc/containerd/config.toml <<'EOF' | ||
version = 3 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How do we ensure that this is only run on containerd versions that support it?
|
||
# Create a custom config that will be imported | ||
# Use the correct plugin path for containerd v2/v3 | ||
cat > /etc/containerd/custom.d/10-custom.toml <<'EOF' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought the correct path was: /etc/containerd/conf.d
? https://github.com/containerd/containerd/pull/12323/files
Expect(err).ToNot(HaveOccurred()) | ||
|
||
// Verify containerd can load the custom import before installer | ||
_, _, err = nestedContainerRunner.Run(`systemctl restart containerd && sleep 2 && containerd config dump | grep -i myregistry`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are we randomly checking for myregistry
? Is this sufficient?
// Run a container with NVIDIA runtime | ||
// Note: We use native snapshotter because overlay doesn't work in nested containers | ||
output, stderr, err = nestedContainerRunner.Run(` | ||
ctr run --rm --snapshotter native docker.io/library/busybox:latest test-nvidia echo "Hello from NVIDIA runtime" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How does this use the nvidia runtime?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, if no devices are requested, then the nvidia-runtime is a no-op.
|
||
// Pull a test image | ||
output, stderr, err := nestedContainerRunner.Run(` | ||
ctr image pull docker.io/library/busybox:latest |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does ctr automatically use the containerd config?
Signed-off-by: Carlos Eduardo Arango Gutierrez <[email protected]>
c65e468
to
ef6d766
Compare
No description provided.