Skip to content

[Bugfix] Check if port is used in publish flag [duplicate of #2190] #4097

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jun 6, 2025

Conversation

swagatbora90
Copy link
Contributor

This PR continues the work from #2190 (originally by @vsiravar) which checks for used ports while allocating host port in -p/--publish. The original PR is already approved and was just waiting for a rebase form the author. Most of the comments have already been addressed.

Addressing port cleanup on stop:

The original PR had some open discussion around handling port cleanups on container stop

I concur with it, but hold reservations about merging this PR. I believe we should first have nerdctl stop manage the release of ports.
The rest LGTM.

cc vsiravar, could you please fix the stop ( it can be a different PR) so we can merge this accordingly? Thanks

This is now already being addressed with this PR #2839 as network cleanup is part of both container stop and kill

Verified the fix:

 % sudo ./_output/nerdctl run -p 8080:80 --name=test -d nginx
57a24f8da7127aeacee8b888a90e0e4eda866b35513844620de4ad1d28200547

 % sudo ./_output/nerdctl run -p 8080:80  -d nginx
FATA[0000] failed to load networking flags: bind for :8080 failed: port is already allocated 

 % sudo nerdctl stop test
test

% sudo ./_output/nerdctl run -p 8080:80  -d nginx
4ad5ca030355b1eb537c3caa91dfd5a9944dd36e8dea1756a4334a986f96c682

Fixes: #2179

cc @fahedouch @vsiravar

@swagatbora90 swagatbora90 changed the title Check if port is used in publish flag [Bugfix] Check if port is used in publish flag [duplicate of #2190] Apr 9, 2025
@swagatbora90 swagatbora90 reopened this Apr 9, 2025
@swagatbora90 swagatbora90 force-pushed the fix-port-check-vsiravar branch from aaeb6ce to af654a8 Compare April 10, 2025 17:16
@swagatbora90 swagatbora90 force-pushed the fix-port-check-vsiravar branch 3 times, most recently from 4e10732 to 65feab1 Compare April 18, 2025 22:24
@swagatbora90 swagatbora90 force-pushed the fix-port-check-vsiravar branch from 65feab1 to 002c0d4 Compare April 25, 2025 21:37
@swagatbora90
Copy link
Contributor Author

The current implementation breaks in rootless mode which is resulting in the test failures. I am guessing this is mainly due to how Rootlesskit uses slirp4netns to create a separate net ns and possibly the port allocation information is no longer visible in /proc/net/*.

We can either implement some mechanism to track the rootless state or may be add a generic port store to track all port usage.

@fahedouch
Copy link
Member

The current implementation breaks in rootless mode which is resulting in the test failures. I am guessing this is mainly due to how Rootlesskit uses slirp4netns to create a separate net ns and possibly the port allocation information is no longer visible in /proc/net/*.

We can either implement some mechanism to track the rootless state or may be add a generic port store to track all port usage.

Rootless > 2.0 starts with a detached network namespace by default. You need to nsenter into the child network namespace, where the container's network is managed, and the ports should be checked there. You have several ways to nsenter, either by executing a child process or using the library https://pkg.go.dev/github.com/containernetworking/plugins/pkg/ns#WithNetNSPath (I prefer the containernetworking library)

@@ -42,24 +42,56 @@ func filter(ss []procnet.NetworkDetail, filterFunc func(detail procnet.NetworkDe
}

func portAllocate(protocol string, ip string, count uint64) (uint64, uint64, error) {
netprocData, err := procnet.ReadStatsFileData(protocol)
usedPort, err := getUsedPorts(ip, protocol)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
usedPort, err := getUsedPorts(ip, protocol)
usedPorts, err := getUsedPorts(ip, protocol)

if err != nil {
return 0, 0, err
}
netprocItems := procnet.Parse(netprocData)

start := uint64(allocateStart)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to make things simpler please declare allocateStart and allocateEnd with uint64 type

@fahedouch
Copy link
Member

ping @swagatbora90

@swagatbora90 swagatbora90 force-pushed the fix-port-check-vsiravar branch 3 times, most recently from 15c910f to 1f3a23f Compare May 29, 2025 20:29
@swagatbora90
Copy link
Contributor Author

swagatbora90 commented May 29, 2025

Rootless > 2.0 starts with a detached network namespace by default. You need to nsenter into the child network namespace, where the container's network is managed, and the ports should be checked there. You have several ways to nsenter, either by executing a child process or using the library https://pkg.go.dev/github.com/containernetworking/plugins/pkg/ns#WithNetNSPath (I prefer the containernetworking library)

Hi @fahedouch, sorry for the late response as I was on an extended leave. I have updated the code now to use WithDetachedNetNSIfAny while checking for used ports so that we can support rootlesskit with detach netns mode.

However, I found that the root cause of the rootless test failure was something different. When running in rootless mode, once a container is created with port map, we see two entries in /proc/net/tcp. Take for example the a port map of 8080 -> 80 appear as following

sl  local_address                         remote_address                        st tx_queue rx_queue tr tm->when retrnsmt   uid  timeout inode

0: 00000000000000000000000000000000:1F90 00000000000000000000000000000000:0000 0A 00000000:00000000 00:00000000 00000000  1000        0 240717 1 0000000000000000 100 0 0 10 0

3: 0000000000000000FFFF00000100007F:1F90 0000000000000000FFFF00000100007F:ECE2 01 00000000:00000000 02:00000414 00000000  1000        0 233083 2 0000000000000000 20 4 31 10 -1

Line 0: 00000000000000000000000000000000:1F90 - A socket listening on port 8080 (1F90 hex) on
all interfaces, state 0A (LISTEN), owned by UID 1000
• Line 3: 0000000000000000FFFF00000100007F:1F90 - An established connection (state 01) on port
8080 on the loopback interface (127.0.0.1), also owned by UID 1000

The second connection does the network proxy to send incoming request to the container. Since in rootful mode, the runtime can directly access the host network stack, it only needs to create a single socket that forwards traffic to the container.

The interesting bit here is that when we stop/remove the rootless container, I see that the 1st socket connection gets removed immediately, however the second connection moves to TIME_WAIT state with uid changing from 1000 to 0. I am not sure why but looks like standard tcp behavior. This eventually gets removed, but does take 10-20 seconds to clear.

2: 0000000000000000FFFF00000100007F:1F90 0000000000000000FFFF00000100007F:D25A 06 00000000:00000000 03:00001731 00000000     0        0 0 3 0000000000000000

This causes new container start/run to fail as we continue to see :8080(1F90) in the tcp stats though the state is clearly marked as waiting to be closed.

To resolve the issue, I am now checking for both the ports and their state and specifically ignoring ports in TIME_WAIT and CLOSE_WAIT state as these ports are no longer in use and available to be allocated.

@swagatbora90 swagatbora90 force-pushed the fix-port-check-vsiravar branch 3 times, most recently from f389807 to c1a49d2 Compare June 2, 2025 16:57
@swagatbora90
Copy link
Contributor Author

nerdctl gomodjail tests are consistently failing with errors such as could not get iptables version /usr/sbin/iptables: bad file descriptor":

time=2025-06-02T17:15:46.283Z level=WARN msg=***Blocked*** pid=76099 exe=/home/rootless/.cache/g |
        |         | omodjail/1f381f194c295770/nerdctl syscall=fcntl entry=/go/pkg/mod/github.com/coreos/go-iptables@ |
        |         | v0.8.0/iptables/iptables.go:659:github.com/coreos/go-iptables/iptables.getIptablesVersionString  |
        |         | module=github.com/coreos/go-iptables                                                             |
        |         | time="2025-06-02T17:15:46Z" level=fatal msg="failed to load networking flags: could not get ipta |
        |         | bles version: fork/exec /usr/sbin/iptables: bad file descriptor"
                                 |

Seems unrelated to the current changes. Is this a known issue @apostasie @fahedouch ?

@fahedouch
Copy link
Member

fahedouch commented Jun 3, 2025

My knowledge of slirp4netns is very limited, but I think it is the culprit. @AkihiroSuda, can you confirm that it is the one opening the socket on the loopback address 127.0.0.1 and not removing it immediately? It might be an issue to fix in slirp or something need to handled in postStop ocihook (wait for the socket to be removed before removing the container)

@AkihiroSuda
Copy link
Member

fork/exec /usr/sbin/iptables: bad file descriptor

This line should have //gomodjail:unconfined directive

nerdctl/go.mod

Line 31 in b8c4b3d

github.com/coreos/go-iptables v0.8.0

@AkihiroSuda
Copy link
Member

My knowledge of slirp4netns is very limited, but I think it is the culprit. @AkihiroSuda, can you confirm that it is the one opening the socket on the loopback address 127.0.0.1 and not removing it immediately? It might be an issue to fix in slirp or something need to handled in postStop ocihook (wait for the socket to be removed before removing the container)

The default port driver is still RootlessKit's builtin, not slirp4netns

@fahedouch
Copy link
Member

The builtin driver is fast, but be aware that the source IP is not propagated and always set to 127.0.0.1.

ok ^^

handled in postStop ocihook (wait for the socket to be removed before removing the container)

WDYT about this. I think it is the responsibility of the container/network driver to ensure that the sockets are clean before releasing the container.

@swagatbora90
Copy link
Contributor Author

fork/exec /usr/sbin/iptables: bad file descriptor

This line should have //gomodjail:unconfined directive

nerdctl/go.mod

Line 31 in b8c4b3d

github.com/coreos/go-iptables v0.8.0

Opened a separate PR #4303 to add the directive

@swagatbora90
Copy link
Contributor Author

WDYT about this. I think it is the responsibility of the container/network driver to ensure that the sockets are clean before releasing the container.

I was thinking the same, but when I looked into the current implementation found that nerdctl is using rootlesskit portmanager to handle port adds/deletes. The proxy connection creation and deletion is handled in rootlesskit, so nerdctl does not have much control over it.

@swagatbora90 swagatbora90 force-pushed the fix-port-check-vsiravar branch from c1a49d2 to d49a66c Compare June 4, 2025 19:54
@swagatbora90
Copy link
Contributor Author

swagatbora90 commented Jun 4, 2025

There is still some flakiness in some test. Related #4143

=== RUN   TestComposeUp
    testing_linux.go:69: projectName="nerdctl-compose-test2030087482"
    testing_linux.go:71: assertion failed: res.ExitCode is not exitCode: 
        Command:  /usr/local/bin/nerdctl --namespace=nerdctl-test compose -f /tmp/nerdctl-compose-test2030087482/docker-compose.yaml up -d
        ExitCode: 1
        Error:    exit status 1
        Stdout:   
        Stderr:   time="2025-06-04T20:14:35Z" level=info msg="Creating network nerdctl-compose-test2030087482_default"
        time="2025-06-04T20:14:35Z" level=info msg="Creating volume nerdctl-compose-test2030087482_wordpress"
        time="2025-06-04T20:14:35Z" level=info msg="Creating volume nerdctl-compose-test2030087482_db"
        time="2025-06-04T20:14:35Z" level=info msg="Ensuring image ghcr.io/stargz-containers/wordpress:5.7-org"
        time="2025-06-04T20:14:35Z" level=info msg="Ensuring image ghcr.io/stargz-containers/mariadb:10.5-org"
        time="2025-06-04T20:14:35Z" level=info msg="Creating container nerdctl-compose-test2030087482-wordpress-1"
        time="2025-06-04T20:14:35Z" level=info msg="Running [/usr/local/bin/nerdctl --namespace=nerdctl-test run --cidfile=/tmp/compose-4290938950/cid -l=com.docker.compose.project=nerdctl-compose-test2030087482 -l=com.docker.compose.service=wordpress -d --name=nerdctl-compose-test2030087482-wordpress-1 --pull=never -e=WORDPRESS_DB_NAME=exampledb -e=WORDPRESS_DB_HOST=db -e=WORDPRESS_DB_USER=exampleuser -e=WORDPRESS_DB_PASSWORD=examplepass --net=nerdctl-compose-test2030087482_default --hostname=wordpress -p=8080:80/tcp --restart=always -v=nerdctl-compose-test2030087482_wordpress:/var/www/html ghcr.io/stargz-containers/wordpress:5.7-org]"
        time="2025-06-04T20:14:35Z" level=warning msg="volume \"nerdctl-compose-test2030087482_wordpress\" already exists and will be returned as-is"
        time="2025-06-04T20:14:35Z" level=fatal msg="OCI runtime start failed: cannot start a container that has stopped: unknown"
        time="2025-06-04T20:14:35Z" level=fatal msg="error while creating container nerdctl-compose-test2030087482-wordpress-1: error while creating container nerdctl-compose-test2030087482-wordpress-1: exit status 1"

It appears that this test failed and perhaps there was some issue with cleanup. As a result a lot of other tests that relies on host port 8080 started failing with port already allocated errors. This does mean that with by relying on a single host port in all our test (in this case port 8080), there is a higher chance of unrelated tests failing. We can randomize the port used in the test so that it is not always 8080, but would prefer to keep it in a separate PR

@@ -59,6 +59,7 @@ func ParseFlagP(s string) ([]cni.PortMapping, error) {
case 2:
proto = strings.ToLower(splitBySlash[1])
switch proto {
// sctp is not a supported protocol
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment suggests SCTP isn't supported, but it is present in the case line 63

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed

Signed-off-by: Swagat Bora <[email protected]>

Co-authored-by: Vishwas Siravara <[email protected]>
@swagatbora90 swagatbora90 force-pushed the fix-port-check-vsiravar branch from d49a66c to 04f836d Compare June 6, 2025 00:02
Copy link
Member

@fahedouch fahedouch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks

@fahedouch fahedouch added this to the v2.1.3 milestone Jun 6, 2025
@@ -27,6 +27,7 @@ import (
type NetworkDetail struct {
LocalIP net.IP
LocalPort uint64
State int
}

func Parse(data []string) (results []NetworkDetail) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we have unit tests?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can add some unit test as a follow up.

tcName := fmt.Sprintf("%+v", tc)
t.Run(tcName, func(t *testing.T) {
if strings.Contains(tc.containerPort, "sctp") && rootlessutil.IsRootless() {
t.Skip("sctp is not supported in rootless mode")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we already support sctp for rootful?

Copy link
Member

@AkihiroSuda AkihiroSuda left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks

@AkihiroSuda AkihiroSuda merged commit d937248 into containerd:main Jun 6, 2025
57 of 58 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

nerdctl run -p <host-port>:<container port> <image> does not check if user defined host port is in use.
6 participants