Skip to content

Conversation

@elezar
Copy link
Member

@elezar elezar commented Mar 5, 2025

This change extracts the filemode from the host device nodes for CDI device specs. This allows the spec to be used in cases where a low-level runtime such as runc does not query the existing device node in the same way.

dn, err := devices.DeviceFromPath(path, "rwm")
if err != nil {
// return nil, fmt.Errorf("failed to get device information for %q: %w", path, err)
return nil
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To what will nil translate to in the spec? => 0 ?

Copy link
Member Author

@elezar elezar Mar 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nil would be the current behaviour. In the case of runc this is filled in by the runtime and read from the host. In the case of kata, this would be 0 as you've observed.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The interface is the CDI spec, so how does runc know this value is nil? It is not encoded in the CDI spec.

Copy link
Member Author

@elezar elezar Mar 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, maybe runc is not the correct reference here. For regular devices, a higher-level runtime such as containerd also extracts the FileMode (see https://github.com/containerd/containerd/blob/9deb0b8f748ee038dfd48c51d7aee6b55e34e9c1/pkg/oci/utils_unix.go#L142). This is not done for CDI devices since the code from https://github.com/cncf-tags/container-device-interface/blob/012c8be38d9444f08b7e6eeedbcddd733adf8632/pkg/cdi/container-edits_unix.go#L60 is used for this instead.

This means that for OCI devices created from a CDI spec, this field is nil if unspecified, whereas for "regular" devices this field is populated from the device node on the host.

@klihub do you know why we elected to not add information other than the Major and Minor numbers to the devices in CDI?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@elezar Do you mean what is the reason for not filling in any other missing information wrt. the CDI Spec DeviceNode than type, major, and minor ? TBH, I don't recall any specifics, but I suspect it's only because that's the minimum working implementation.

@elezar elezar force-pushed the add-additional-device-node-information branch from eb70273 to 4dae5ae Compare March 6, 2025 08:43
@elezar elezar marked this pull request as draft March 19, 2025 14:51
@elezar elezar force-pushed the add-additional-device-node-information branch from 4dae5ae to 0a16c93 Compare June 19, 2025 12:59
@coveralls
Copy link

coveralls commented Jun 19, 2025

Pull Request Test Coverage Report for Build 15758891592

Details

  • 11 of 18 (61.11%) changed or added relevant lines in 1 file are covered.
  • No unchanged relevant lines lost coverage.
  • Overall coverage decreased (-0.003%) to 33.641%

Changes Missing Coverage Covered Lines Changed/Added Lines %
internal/edits/device.go 11 18 61.11%
Totals Coverage Status
Change from base Build 15744214901: -0.003%
Covered Lines: 4369
Relevant Lines: 12987

💛 - Coveralls

@github-actions
Copy link

github-actions bot commented Nov 5, 2025

This PR is stale because it has been open 90 days with no activity. This PR will be closed in 30 days unless new comments are made or the stale label is removed. To skip these checks, apply the "lifecycle/frozen" label.

@github-actions github-actions bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Nov 5, 2025
@elezar elezar added this to the next-minor milestone Nov 14, 2025
@elezar elezar added lifecycle/frozen and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Nov 14, 2025
@elezar elezar force-pushed the add-additional-device-node-information branch from 85adc3f to 100ce7e Compare November 19, 2025 17:07
@elezar elezar marked this pull request as ready for review November 19, 2025 17:07
Copy link

@zvonkok zvonkok left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, works as expected

This change extracts the required information from the host device
node if available.

Signed-off-by: Evan Lezar <[email protected]>
@elezar elezar force-pushed the add-additional-device-node-information branch from 100ce7e to f7f0c3e Compare November 20, 2025 21:47
@elezar elezar modified the milestones: next-minor, v1.19.0 Nov 20, 2025
@elezar elezar merged commit bac9b8d into NVIDIA:main Nov 20, 2025
10 checks passed
@elezar elezar deleted the add-additional-device-node-information branch November 20, 2025 21:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants