Skip to content

Conversation

elezar
Copy link
Member

@elezar elezar commented Aug 19, 2025

By default container extracts the current config by running the containerd config dump command and if that fails we read the existing config file.

This change allows ordering of these sources to be defined as we do for the nvidia-ctk runtime configure command.

See #1222 where setting RUNTIME_CONFIG_SOURCES=file or RUNTIME_CONFIG_SOURCES=file,command should allow the current config to be preferred over the containerd config dump command.

This further allows an explicit path to be specified for the file config source so that there is a separation between the config file being modified by the tooling and the config used as the source for drop-in files, for example.

@elezar elezar added the bug Issue/PR to expose/discuss/fix a bug label Aug 19, 2025
@coveralls
Copy link

coveralls commented Aug 19, 2025

Pull Request Test Coverage Report for Build 18193238472

Details

  • 16 of 28 (57.14%) changed or added relevant lines in 2 files are covered.
  • 1 unchanged line in 1 file lost coverage.
  • Overall coverage increased (+0.03%) to 37.21%

Changes Missing Coverage Covered Lines Changed/Added Lines %
cmd/nvidia-ctk-installer/container/runtime/containerd/containerd.go 16 22 72.73%
cmd/nvidia-ctk-installer/container/runtime/runtime.go 0 6 0.0%
Files with Coverage Reduction New Missed Lines %
cmd/nvidia-ctk-installer/container/runtime/runtime.go 1 0.0%
Totals Coverage Status
Change from base Build 18191171126: 0.03%
Covered Lines: 4972
Relevant Lines: 13362

💛 - Coveralls

@elezar elezar added must-backport The changes in PR need to be backported to at least one stable release branch. and removed bug Issue/PR to expose/discuss/fix a bug labels Aug 19, 2025
@elezar elezar requested review from klueska and cdesiniotis August 19, 2025 18:46
switch source {
case "file":
loaders = append(loaders, toml.FromFile(o.Config))
case "command":
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note to self: Since command here implies containerd config dump command we may want to make this more expressive. For now I have used the values that we support in the nvidia-ctk runtime configure command:

&cli.StringFlag{
Name: "config-source",
Usage: "the source to retrieve the container runtime configuration; one of [command, file]\"",
Destination: &config.configSource,
Value: defaultConfigSource,
},

Comment on lines 106 to 124
&cli.StringSliceFlag{
Name: "config-source",
Usage: "specify the config sources",
Destination: &opts.ConfigSources,
Sources: cli.EnvVars("RUNTIME_CONFIG_SOURCES"),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Flag to be called config-source-order (or similar)

specify the order of config sources

As the user can do --config-source command,file or --config-source file,command

same for the ENV VAR RUNTIME_CONFIG_SOURCES_ORDER

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't agree that the flag name should be changed. The flag specifies the sources and the ordering is implied by the order in which the sources are specified. Valid values for this argument are currently:

  • ""
  • "file"
  • "command"
  • "command,file" (Default)
  • "file,command"

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then let's expand the Usage to be more explicit, "specify the config sources [file, command]" as nvidia-ctk runtime configure has a flag with the same name but this one is to point to a path. We should provide guidance to the user via Usage with at least the valid options or the default behav of this flag if not set.

@ArangoGutierrez
Copy link
Collaborator

Can we mark this PR as ready for review?

@elezar
Copy link
Member Author

elezar commented Sep 25, 2025

Can we mark this PR as ready for review?

I think we can move this to the next minor (if we decide to implement it at all). See the comment here #1222 (comment) for a workaround in the existing codebase.

@elezar elezar removed the must-backport The changes in PR need to be backported to at least one stable release branch. label Sep 26, 2025
@elezar elezar added this to the next-minor milestone Sep 26, 2025
@elezar elezar force-pushed the allow-config-source-to-be-overridden branch 3 times, most recently from 3de75ef to 8afbb32 Compare October 2, 2025 12:39
@elezar elezar marked this pull request as ready for review October 2, 2025 16:46
@cdesiniotis
Copy link
Contributor

I am testing this with the operator and microk8s, by setting the following with the toolkit container:

env:
    - name: CONTAINERD_CONFIG
      value: /var/snap/microk8s/current/args/containerd-template.toml
    - name: CONTAINERD_SOCKET
      value: /var/snap/microk8s/common/run/containerd.sock
    - name: RUNTIME_CONFIG_SOURCE
      value: file=/var/snap/microk8s/current/args/containerd.toml

This does not work as is since the path provided to RUNTIME_CONFIG_SOURCE is the path to the file on the host and NOT in the container.

@elezar
Copy link
Member Author

elezar commented Oct 2, 2025

I am testing this with the operator and microk8s, by setting the following with the toolkit container:

env:
    - name: CONTAINERD_CONFIG
      value: /var/snap/microk8s/current/args/containerd-template.toml
    - name: CONTAINERD_SOCKET
      value: /var/snap/microk8s/common/run/containerd.sock
    - name: RUNTIME_CONFIG_SOURCE
      value: file=/var/snap/microk8s/current/args/containerd.toml

This does not work as is since the path provided to RUNTIME_CONFIG_SOURCE is the path to the file on the host and NOT in the container.

Yes, I missed this when trying to think this through. The user could provide the container path here, but that's definitely not ideal.

@elezar
Copy link
Member Author

elezar commented Oct 6, 2025

@cdesiniotis what if we also add the following to the GPU Operator:

	if runtimeConfigSources := getContainerEnv(container, "RUNTIME_CONFIG_SOURCE"); runtimeConfigSources != "" {
		var sources []string
		for _, runtimeConfigSource := range strings.Split(runtimeConfigSources, ",") {
			parts := strings.SplitN(runtimeConfigSource, "=", 2)
			if len(parts) == 1 || parts[0] != "file" {
				sources = append(sources, runtimeConfigSource)
				continue
			}
			sourceConfigFile := parts[1]
			containerConfigFile := sourceConfigFile
			if sourceConfigFile != "" {
				sourceConfigFileName := path.Base(sourceConfigFile)
				sourceConfigDir := path.Dir(sourceConfigFile)

				// TODO: If the source config file is not in the same folder as
				// the top-level config file, then we need to use a different
				// container path here.
				containerConfigDir := DefaultRuntimeConfigTargetDir
				containerConfigFile = containerConfigDir + sourceConfigFileName

				// If we've already mounted sourceConfigDir -> containerConfigDir
				// then we don't create a new volume mount.
				if hostToContainerDirs[sourceConfigDir] != "" {
					volMountConfigName := fmt.Sprintf("%s-source-config", runtime)
					volMountConfig := corev1.VolumeMount{Name: volMountConfigName, MountPath: containerConfigDir}
					container.VolumeMounts = append(container.VolumeMounts, volMountConfig)

					configVol := corev1.Volume{Name: volMountConfigName, VolumeSource: corev1.VolumeSource{HostPath: &corev1.HostPathVolumeSource{Path: sourceConfigDir, Type: newHostPathType(corev1.HostPathDirectoryOrCreate)}}}
					obj.Spec.Template.Spec.Volumes = append(obj.Spec.Template.Spec.Volumes, configVol)
				}
			}
			sources = append(sources, "file="+containerConfigFile)
		}
		setContainerEnv(container, "RUNTIME_CONFIG_SOURCE", strings.Join(sources, ","))
	}

(We can definitely factor out some logic here to make things more readable across the multiple instances where we are transforming host paths to container paths).

This would mean that a user would specify file=/path/on/the/host/config.toml as the source and the path will be updated in the same way as other paths are treated.

@cdesiniotis
Copy link
Contributor

A simpler solution would be to prepend /host to the file path -- as the toolkit container already has read-only access to the host's rootfs at /host. This would avoid us potentially having to add an additional volume. I have raised NVIDIA/gpu-operator#1758 with this proposed change.

@cdesiniotis
Copy link
Contributor

I've tested this PR paired with NVIDIA/gpu-operator#1758 and it provides a working solution for microk8s.

By default containerd extracts the current config by running
the `containerd config dump` command and if that fails
we read the existing config file.

This change allows ordering of these sources to be defined as
we do for the nvidia-ctk runtime configure command.

Signed-off-by: Evan Lezar <[email protected]>
@elezar elezar force-pushed the allow-config-source-to-be-overridden branch from 2c54980 to b426551 Compare October 7, 2025 07:43
@elezar elezar force-pushed the allow-config-source-to-be-overridden branch from b426551 to 0d59175 Compare October 7, 2025 08:05
This change allows a "file" config source to also specify the source
path of the file to process. This means that when configuring containerd
with the option --config-source=file=/foo/bar.toml the file /foo/bar.toml
will be used as the source config instead of the output config file.

Signed-off-by: Evan Lezar <[email protected]>
@elezar elezar force-pushed the allow-config-source-to-be-overridden branch from 0d59175 to c789a8d Compare October 7, 2025 08:08
@elezar elezar changed the title Allow config sources to be specified for containerd Allow config sources to be specified for containerd and cri-o Oct 7, 2025
@elezar elezar modified the milestones: next-minor, v1.18.0 Oct 7, 2025
@elezar elezar merged commit 4997729 into NVIDIA:main Oct 7, 2025
13 checks passed
@elezar elezar deleted the allow-config-source-to-be-overridden branch October 7, 2025 09:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants