Skip to content

Linux Network Devices #4538

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 3 commits into
base: main
Choose a base branch
from
Open

Linux Network Devices #4538

wants to merge 3 commits into from

Conversation

aojea
Copy link

@aojea aojea commented Nov 21, 2024

Implementation of opencontainers/runtime-spec#1271

It implements the new proposal to the OCI spec to be able to specify Network Devices that get attached detached from the containers (updated to match the merged proposal opencontainers/runtime-spec#1271)

@aojea aojea force-pushed the netdevices branch 2 times, most recently from 07d3b0b to 3833056 Compare December 2, 2024 15:40
Copy link

@kad kad left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. We are also interested in this use case for our accelerator devices.

@aojea aojea force-pushed the netdevices branch 2 times, most recently from 67f12e0 to d114afe Compare December 12, 2024 07:15
@aojea aojea force-pushed the netdevices branch 2 times, most recently from ec90a02 to f4f5d02 Compare December 20, 2024 12:11
@aojea aojea force-pushed the netdevices branch 2 times, most recently from 735f9d5 to ce1f612 Compare January 14, 2025 09:47
@aojea aojea force-pushed the netdevices branch 4 times, most recently from 6262c5e to c530772 Compare February 6, 2025 09:55
@aojea aojea force-pushed the netdevices branch 5 times, most recently from 4380e86 to f53d263 Compare February 10, 2025 21:27
@rata
Copy link
Member

rata commented Feb 25, 2025

@aojea friendly reminder that this should be ready soon, we are cutting 1.3.0-rc.1 soon (maybe this week). The spec part is still not sure when it will be merged?

@aojea
Copy link
Author

aojea commented Apr 29, 2025

The rootless jobs fail when I move the network namespace logic to the setup block, it is not able to run the test despite is skipped ... I tried different things without success, so I encapsulated the create_netns logic in a function and moves it inside the test block, so is skipped correctly

@aojea aojea force-pushed the netdevices branch 2 times, most recently from 954978a to e87aa5e Compare May 9, 2025 07:23
@aojea
Copy link
Author

aojea commented May 9, 2025

kindly ping @kolyshkin @rata , please let me know what is missing

@rata
Copy link
Member

rata commented May 13, 2025

Sorry, so far you are not missing anything, I'm quite overloaded with lot of stuff now and I didn't manage to get back to this. It was very close last time I checked, I guess it's ready or almost. I can try to have a look next week.

Sorry again for the delay, it's quite hard with lot of deadlines in these weeks for me to find the time. But For sure I'll have the time for this is included in 1.4, don't worry about that :)

@aojea aojea force-pushed the netdevices branch 2 times, most recently from cc520e0 to 5e36d0b Compare May 16, 2025 09:54
@aojea
Copy link
Author

aojea commented May 16, 2025

updated the code to handle the change on the vishvananda library and recover the previous behavior of the library, do not think we need to add retries as they were not needed before

ref vishvananda/netlink#1018

Comment on lines +115 to +124
link, err := netlink.LinkByName(name)
// recover same behavior on vishvananda/[email protected] and do not fail when the kernel returns NLM_F_DUMP_INTR.
if err != nil && !errors.Is(err, netlink.ErrDumpInterrupted) {
return fmt.Errorf("link not found for interface %s on runtime namespace: %w", name, err)
}

// Set the interface link state to DOWN before modifying attributes like namespace or name.
// This prevents potential conflicts or disruptions on the host network during the transition,
// particularly if other host components depend on this specific interface or its properties.
err = netlink.LinkSetDown(link)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code says that if netlink.LinkByName returns netlink.ErrDumpInterrupted, than the link value is valid, am I reading it right?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is the same as with previous behavior, before the 1.2.1 it never returned any error. So you always operate with the data returned by the kernel at the time. Now, despite that you know that something has changed, you keep operating with the returned data and in my experience it always has been valid ... the change on returning an error caused a lot of impact in the ecosystem and flakiness

@aojea aojea force-pushed the netdevices branch 3 times, most recently from a9dc973 to 991b0b1 Compare May 21, 2025 12:07
@lifubang
Copy link
Member

LGTM overall.
Just a question, it seems that there is no mechanism to attach network devices after c/r operation.
If we checkpointed a container with attached network devices, these network devices will disappeared after we restored from the dumped check point files.

// to the one created by runc to run the container process.
nsPath := p.config.Config.Namespaces.PathOf(configs.NEWNET)
if nsPath == "" {
nsPath = fmt.Sprintf("/proc/%d/ns/net", p.pid())
Copy link
Member

@lifubang lifubang May 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another question:
First, I think you have not already implemented the function of detached described in your PR description.
So I think we should throw an error here instead of setting a default value for net ns path when it is empty. If not, there is no opportunity to move the attached network devices back to their original net ns when deleting the container.
I think if we want to use network devices attaching function in a container, the caller should provide some network devices, as well as a net ns path.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And when we deleting a container, we should move the attached network devices back to their original net ns. WDYT?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That was indeed one of the most discussed topics of the proposal opencontainers/runtime-spec#1271 , the final consensus was not to handle the dettachment of interfaces because of two reasons, the network namespace on runtimes like crio or containerd is managed by those instead of runc, and if the process crashes or dies there is no possibility for the runtime to participate.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, if this is the proposed design, there is another code simplify here:

I think at this time, the container init process has joined the provided net namespace, so we can merge this two situations, we don't need to get the net ns path from the config, we can use the process's net ns path directly like this:

nsPath := fmt.Sprintf("/proc/%d/ns/net", p.pid())

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that being explicit here makes the code more readable, since this simplification will not be necessary obvious from the code. Although, I recognize I'm not an expert here, so I may be wrong, but I always prefer the code to be verbose since there is no performance win in removing these lines of code

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I admit it's more readable here. But it will introduce a race condition here, if the net ns path is deleted and created with the same name again by others just at the time after the container init process joined but before we open the path. In fact, if we simplify the code, we will not need to care about whether the net ns path deleted or not.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added the simplification and a comment explaining the rationale behind

@aojea
Copy link
Author

aojea commented May 25, 2025

Just a question, it seems that there is no mechanism to attach network devices after c/r operation.

I'm not familiar with those operations but I'm happy to investigate more if you provide some guidance

@lifubang
Copy link
Member

I'm not familiar with those operations but I'm happy to investigate more if you provide some guidance

diff --git a/tests/integration/checkpoint.bats b/tests/integration/checkpoint.bats
index 3db34061..7608e4ff 100644
--- a/tests/integration/checkpoint.bats
+++ b/tests/integration/checkpoint.bats
@@ -2,14 +2,34 @@
 
 load helpers
 
+function create_netns() {
+       # Create a temporary name for the test network namespace.
+       tmp=$(mktemp -u)
+       ns_name=$(basename "$tmp")
+
+       # Create the network namespace.
+       ip netns add "$ns_name"
+       ns_path=$(ip netns add "$ns_name" 2>&1 | sed -e 's/.*"\(.*\)".*/\1/')
+}
+
+function delete_netns() {
+       # Delete the namespace only if the ns_name variable is set.
+       [ -v ns_name ] && ip netns del "$ns_name"
+}
+
 function setup() {
        # XXX: currently criu require root containers.
        requires criu root
 
        setup_busybox
+
+       # Create a dummy interface to move to the container.
+       ip link add dummy0 type dummy
 }
 
 function teardown() {
+       ip link del dev dummy0
+       delete_netns
        teardown_bundle
 }
 
@@ -100,10 +120,16 @@ function runc_restore_with_pipes() {
 }
 
 function simple_cr() {
+       # Tell runc which network namespace to use.
+       # create_netns
+       # update_config '(.. | select(.type? == "network")) .path |= "'"$ns_path"'"'
+       update_config ' .linux.netDevices |= {"dummy0": {} }'
        runc run -d --console-socket "$CONSOLE_SOCKET" test_busybox
        [ "$status" -eq 0 ]
 
        testcontainer test_busybox running
+       runc exec test_busybox ip address show dev dummy0
+       [ "$status" -eq 0 ]
 
        for _ in $(seq 2); do
                # checkpoint the running container
@@ -119,6 +145,8 @@ function simple_cr() {
 
                # busybox should be back up and running
                testcontainer test_busybox running
+               runc exec test_busybox ip address show dev dummy0
+               [ "$status" -eq 0 ]
        done
 }

aojea added 3 commits May 29, 2025 08:59
Signed-off-by: Antonio Ojea <[email protected]>
Implement support for passing Linux Network Devices to the container
network namespace.

The network device is passed during the creation of the container,
before the process is started.

It implements the logic defined in the OCI runtime specification.

Signed-off-by: Antonio Ojea <[email protected]>
@aojea
Copy link
Author

aojea commented May 29, 2025

@lifubang at the cost of duplicating code but to improve test errors troubleshooting I duplicated the test cases so we can have simple_cr and simple_cr_with_netdevice, if there is a problem with the netdevice logic then we can spot it very easy since will only affect ones and no the others

@aojea
Copy link
Author

aojea commented May 29, 2025

failed job with

ssh: Could not resolve hostname localhost: Name or service not known

@lifubang
Copy link
Member

lifubang commented May 29, 2025

  • ci / test (actuated-arm64-6cpu-8gb, 1.24.x, rootless) (pull_request)

@alexellis May I ask your help here, is there some special changes cause we can't use localhost in actuated-arm64-6cpu-8gb? Could you have a suggestion to help us to resolve this issue? 🙏

It seems that the failure occurred from 4 days ago. Please see: https://github.com/opencontainers/runc/actions/runs/15232466902/job/42841958687

The other solution is to change ssh rootless@localhost to ssh [email protected].

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants