-
Notifications
You must be signed in to change notification settings - Fork 2.2k
Linux Network Devices #4538
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Linux Network Devices #4538
Conversation
bdb31c1
to
0b771ca
Compare
07d3b0b
to
3833056
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. We are also interested in this use case for our accelerator devices.
67f12e0
to
d114afe
Compare
ec90a02
to
f4f5d02
Compare
735f9d5
to
ce1f612
Compare
6262c5e
to
c530772
Compare
4380e86
to
f53d263
Compare
@aojea friendly reminder that this should be ready soon, we are cutting 1.3.0-rc.1 soon (maybe this week). The spec part is still not sure when it will be merged? |
The rootless jobs fail when I move the network namespace logic to the |
954978a
to
e87aa5e
Compare
kindly ping @kolyshkin @rata , please let me know what is missing |
Sorry, so far you are not missing anything, I'm quite overloaded with lot of stuff now and I didn't manage to get back to this. It was very close last time I checked, I guess it's ready or almost. I can try to have a look next week. Sorry again for the delay, it's quite hard with lot of deadlines in these weeks for me to find the time. But For sure I'll have the time for this is included in 1.4, don't worry about that :) |
cc520e0
to
5e36d0b
Compare
updated the code to handle the change on the vishvananda library and recover the previous behavior of the library, do not think we need to add retries as they were not needed before |
link, err := netlink.LinkByName(name) | ||
// recover same behavior on vishvananda/[email protected] and do not fail when the kernel returns NLM_F_DUMP_INTR. | ||
if err != nil && !errors.Is(err, netlink.ErrDumpInterrupted) { | ||
return fmt.Errorf("link not found for interface %s on runtime namespace: %w", name, err) | ||
} | ||
|
||
// Set the interface link state to DOWN before modifying attributes like namespace or name. | ||
// This prevents potential conflicts or disruptions on the host network during the transition, | ||
// particularly if other host components depend on this specific interface or its properties. | ||
err = netlink.LinkSetDown(link) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This code says that if netlink.LinkByName
returns netlink.ErrDumpInterrupted
, than the link
value is valid, am I reading it right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is the same as with previous behavior, before the 1.2.1 it never returned any error. So you always operate with the data returned by the kernel at the time. Now, despite that you know that something has changed, you keep operating with the returned data and in my experience it always has been valid ... the change on returning an error caused a lot of impact in the ecosystem and flakiness
a9dc973
to
991b0b1
Compare
LGTM overall. |
libcontainer/process_linux.go
Outdated
// to the one created by runc to run the container process. | ||
nsPath := p.config.Config.Namespaces.PathOf(configs.NEWNET) | ||
if nsPath == "" { | ||
nsPath = fmt.Sprintf("/proc/%d/ns/net", p.pid()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another question:
First, I think you have not already implemented the function of detached
described in your PR description.
So I think we should throw an error here instead of setting a default value for net ns path when it is empty. If not, there is no opportunity to move the attached network devices back to their original net ns when deleting the container.
I think if we want to use network devices attaching function in a container, the caller should provide some network devices, as well as a net ns path.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And when we deleting a container, we should move the attached network devices back to their original net ns. WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That was indeed one of the most discussed topics of the proposal opencontainers/runtime-spec#1271 , the final consensus was not to handle the dettachment of interfaces because of two reasons, the network namespace on runtimes like crio or containerd is managed by those instead of runc, and if the process crashes or dies there is no possibility for the runtime to participate.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, if this is the proposed design, there is another code simplify here:
I think at this time, the container init process has joined the provided net namespace, so we can merge this two situations, we don't need to get the net ns path from the config, we can use the process's net ns path directly like this:
nsPath := fmt.Sprintf("/proc/%d/ns/net", p.pid())
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that being explicit here makes the code more readable, since this simplification will not be necessary obvious from the code. Although, I recognize I'm not an expert here, so I may be wrong, but I always prefer the code to be verbose since there is no performance win in removing these lines of code
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I admit it's more readable here. But it will introduce a race condition here, if the net ns path is deleted and created with the same name again by others just at the time after the container init process joined but before we open the path. In fact, if we simplify the code, we will not need to care about whether the net ns path deleted or not.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added the simplification and a comment explaining the rationale behind
I'm not familiar with those operations but I'm happy to investigate more if you provide some guidance |
diff --git a/tests/integration/checkpoint.bats b/tests/integration/checkpoint.bats
index 3db34061..7608e4ff 100644
--- a/tests/integration/checkpoint.bats
+++ b/tests/integration/checkpoint.bats
@@ -2,14 +2,34 @@
load helpers
+function create_netns() {
+ # Create a temporary name for the test network namespace.
+ tmp=$(mktemp -u)
+ ns_name=$(basename "$tmp")
+
+ # Create the network namespace.
+ ip netns add "$ns_name"
+ ns_path=$(ip netns add "$ns_name" 2>&1 | sed -e 's/.*"\(.*\)".*/\1/')
+}
+
+function delete_netns() {
+ # Delete the namespace only if the ns_name variable is set.
+ [ -v ns_name ] && ip netns del "$ns_name"
+}
+
function setup() {
# XXX: currently criu require root containers.
requires criu root
setup_busybox
+
+ # Create a dummy interface to move to the container.
+ ip link add dummy0 type dummy
}
function teardown() {
+ ip link del dev dummy0
+ delete_netns
teardown_bundle
}
@@ -100,10 +120,16 @@ function runc_restore_with_pipes() {
}
function simple_cr() {
+ # Tell runc which network namespace to use.
+ # create_netns
+ # update_config '(.. | select(.type? == "network")) .path |= "'"$ns_path"'"'
+ update_config ' .linux.netDevices |= {"dummy0": {} }'
runc run -d --console-socket "$CONSOLE_SOCKET" test_busybox
[ "$status" -eq 0 ]
testcontainer test_busybox running
+ runc exec test_busybox ip address show dev dummy0
+ [ "$status" -eq 0 ]
for _ in $(seq 2); do
# checkpoint the running container
@@ -119,6 +145,8 @@ function simple_cr() {
# busybox should be back up and running
testcontainer test_busybox running
+ runc exec test_busybox ip address show dev dummy0
+ [ "$status" -eq 0 ]
done
} |
Signed-off-by: Antonio Ojea <[email protected]>
Signed-off-by: Antonio Ojea <[email protected]>
Implement support for passing Linux Network Devices to the container network namespace. The network device is passed during the creation of the container, before the process is started. It implements the logic defined in the OCI runtime specification. Signed-off-by: Antonio Ojea <[email protected]>
@lifubang at the cost of duplicating code but to improve test errors troubleshooting I duplicated the test cases so we can have simple_cr and simple_cr_with_netdevice, if there is a problem with the netdevice logic then we can spot it very easy since will only affect ones and no the others |
failed job with
|
@alexellis May I ask your help here, is there some special changes cause we can't use It seems that the failure occurred from 4 days ago. Please see: https://github.com/opencontainers/runc/actions/runs/15232466902/job/42841958687 The other solution is to change |
Implementation of opencontainers/runtime-spec#1271
It implements the new proposal to the OCI spec to be able to specify Network Devices that get attached
detachedfrom the containers (updated to match the merged proposal opencontainers/runtime-spec#1271)