Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: Servers created with public_net disabled are delayed by 4 minutes on ubuntu 24.04 #1074

Open
johnpccd opened this issue Jan 10, 2025 · 5 comments

Comments

@johnpccd
Copy link

johnpccd commented Jan 10, 2025

What happened?

Servers created with public_net disabled are delayed by 4 minutes at startup because "Failed to wait for network: Unexpected error while running command."

Trying to create an instance in an air-gapped environment (no public ip) using the code below:

The startup time of this instance is 4 minutes longer than if i enable ipv4_enabled.

Here is the output of /var/log/cloud-init-output.log

root@redis:~# cat  /var/log/cloud-init-output.log
Cloud-init v. 24.4-0ubuntu1~22.04.1 running 'init-local' at Fri, 10 Jan 2025 19:54:07 +0000. Up 9.56 seconds.
2025-01-10 19:54:07,926 - url_helper.py[WARNING]: Exception(s) [UrlError("HTTPConnectionPool(host='169.254.169.254', port=80): Max retries exceeded with url: /hetzner/v1/metadata/instance-id (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f9f68fdbf70>: Failed to establish a new connection: [Errno 101] Network is unreachable'))")] during request to http://169.254.169.254/hetzner/v1/metadata/instance-id, raising last exception
2025-01-10 19:54:07,927 - url_helper.py[ERROR]: Timed out, no response from urls: ['http://169.254.169.254/hetzner/v1/metadata/instance-id']
Cloud-init v. 24.4-0ubuntu1~22.04.1 running 'init' at Fri, 10 Jan 2025 19:56:09 +0000. Up 131.52 seconds.
2025-01-10 19:58:09,953 - ubuntu.py[ERROR]: Failed to wait for network: Unexpected error while running command.
Command: ['systemctl', 'start', 'systemd-networkd-wait-online.service']
Exit code: 1
Reason: -
Stdout:
Stderr: Job for systemd-networkd-wait-online.service failed because the control process exited with error code.
        See "systemctl status systemd-networkd-wait-online.service" and "journalctl -xeu systemd-networkd-wait-online.service" for details.

What did you expect to happen?

I expect a flag or someway to prevent waiting for network if there's no ip address

Please provide a minimal working example

resource "hcloud_server" "redis" {
  name        = "redis"
  server_type = var.server_type
  image       = var.image
  location    = var.location
  ssh_keys    = [var.ssh_key_id]

  network {
    network_id = var.network_id
    ip         = var.redis_ip
	alias_ips = []
  }

  public_net {
    ipv4_enabled = false
    ipv6_enabled = false
  }

}
@johnpccd johnpccd added the bug label Jan 10, 2025
@jooola

This comment has been minimized.

@perttuk
Copy link

perttuk commented Feb 4, 2025

I'm experiencing the same thing. Is possible to disable metadata datasource using cloud-init as a workaround? I've tried few things without success.

@nelyodev
Copy link

same here with

public_net {
  ipv4_enabled = false
  ipv6_enabled = false
}

not even dns is working

@jooola
Copy link
Member

jooola commented Feb 13, 2025

I was able to reproduce this bug using ubuntu 24.04, but didn't had any problems with debian-12.

terraform {
  required_providers {
    hcloud = {
      source  = "hetznercloud/hcloud"
      version = "1.49.1"
    }
  }
}

data "hcloud_ssh_keys" "all" {}

resource "hcloud_network" "main" {
  name     = "main"
  ip_range = "10.0.0.0/16"
}

resource "hcloud_network_subnet" "main" {
  network_id   = hcloud_network.main.id
  type         = "cloud"
  network_zone = "eu-central"
  ip_range     = "10.0.1.0/24"
}

resource "hcloud_server" "bastion" {
  name        = "bastion"
  server_type = "cx22"
  image       = "debian-12"
  ssh_keys    = data.hcloud_ssh_keys.all.ssh_keys.*.name

  network {
    network_id = hcloud_network.main.id
    ip         = "10.0.1.1"
    alias_ips  = []
  }
}

resource "hcloud_server" "host" {
  depends_on = [hcloud_network.main, hcloud_network_subnet.main]

  name        = "host"
  server_type = "cx22"
  image       = "debian-12"
  ssh_keys    = data.hcloud_ssh_keys.all.ssh_keys.*.name

  public_net {
    ipv4_enabled = false
    ipv6_enabled = false
  }

  network {
    network_id = hcloud_network.main.id
    ip         = "10.0.1.2"
    alias_ips  = []
  }
}

On the host, while debian logs:

Feb 13 11:10:55 Debian-129-bookworm-64-minimal cloud-init[549]: Cloud-init v. 22.4.2 running 'init-local' at Thu, 13 Feb 2025 11:10:55 +0000. Up 8.38 seconds.
Feb 13 11:10:55 Debian-129-bookworm-64-minimal dhclient[552]: Internet Systems Consortium DHCP Client 4.4.3-P1
Feb 13 11:10:55 Debian-129-bookworm-64-minimal dhclient[552]: Copyright 2004-2022 Internet Systems Consortium.
Feb 13 11:10:55 Debian-129-bookworm-64-minimal dhclient[552]: All rights reserved.
Feb 13 11:10:55 Debian-129-bookworm-64-minimal dhclient[552]: For info, please visit https://www.isc.org/software/dhcp/
Feb 13 11:10:55 Debian-129-bookworm-64-minimal dhclient[552]: 
Feb 13 11:10:56 Debian-129-bookworm-64-minimal dhclient[552]: Listening on LPF/enp7s0/86:00:00:d7:bc:78
Feb 13 11:10:56 Debian-129-bookworm-64-minimal dhclient[552]: Sending on   LPF/enp7s0/86:00:00:d7:bc:78
Feb 13 11:10:56 Debian-129-bookworm-64-minimal dhclient[552]: Sending on   Socket/fallback
Feb 13 11:10:56 Debian-129-bookworm-64-minimal dhclient[552]: DHCPDISCOVER on enp7s0 to 255.255.255.255 port 67 interval 6
Feb 13 11:10:56 Debian-129-bookworm-64-minimal dhclient[552]: DHCPOFFER of 10.0.1.2 from 10.0.0.1
Feb 13 11:10:56 Debian-129-bookworm-64-minimal dhclient[552]: DHCPREQUEST for 10.0.1.2 on enp7s0 to 255.255.255.255 port 67
Feb 13 11:10:56 Debian-129-bookworm-64-minimal dhclient[552]: DHCPACK of 10.0.1.2 from 10.0.0.1
Feb 13 11:10:56 Debian-129-bookworm-64-minimal dhclient[552]: bound to 10.0.1.2 -- renewal in 39372 seconds.
Feb 13 11:10:56 host systemd[1]: Finished cloud-init-local.service - Initial cloud-init job (pre-networking).
Feb 13 11:10:56 host systemd[1]: Reached target network-pre.target - Preparation for Network.
Feb 13 11:10:56 host systemd[1]: Starting networking.service - Raise network interfaces...
Feb 13 11:10:56 host systemd[1]: Finished networking.service - Raise network interfaces.
Feb 13 11:10:56 host systemd[1]: Reached target network.target - Network.
Feb 13 11:10:56 host systemd[1]: Starting cloud-init.service - Initial cloud-init job (metadata service crawler)...
Feb 13 11:10:57 host cloud-init[623]: Cloud-init v. 22.4.2 running 'init' at Thu, 13 Feb 2025 11:10:57 +0000. Up 9.56 seconds.
Feb 13 11:10:57 host cloud-init[623]: ci-info: +++++++++++++++++++++++++++Net device info++++++++++++++++++++++++++++
Feb 13 11:10:57 host cloud-init[623]: ci-info: +--------+-------+-----------+-----------+-------+-------------------+
Feb 13 11:10:57 host cloud-init[623]: ci-info: | Device |   Up  |  Address  |    Mask   | Scope |     Hw-Address    |
Feb 13 11:10:57 host cloud-init[623]: ci-info: +--------+-------+-----------+-----------+-------+-------------------+
Feb 13 11:10:57 host cloud-init[623]: ci-info: | enp7s0 | False |     .     |     .     |   .   | 86:00:00:d7:bc:78 |
Feb 13 11:10:57 host cloud-init[623]: ci-info: |   lo   |  True | 127.0.0.1 | 255.0.0.0 |  host |         .         |
Feb 13 11:10:57 host cloud-init[623]: ci-info: |   lo   |  True |  ::1/128  |     .     |  host |         .         |
Feb 13 11:10:57 host cloud-init[623]: ci-info: +--------+-------+-----------+-----------+-------+-------------------+
Feb 13 11:10:57 host cloud-init[623]: ci-info: +++++++++++++++++++Route IPv6 info+++++++++++++++++++
Feb 13 11:10:57 host cloud-init[623]: ci-info: +-------+-------------+---------+-----------+-------+
Feb 13 11:10:57 host cloud-init[623]: ci-info: | Route | Destination | Gateway | Interface | Flags |
Feb 13 11:10:57 host cloud-init[623]: ci-info: +-------+-------------+---------+-----------+-------+
Feb 13 11:10:57 host cloud-init[623]: ci-info: +-------+-------------+---------+-----------+-------+

Ubuntu logs:

Feb 13 10:56:27 Ubuntu-2404-noble-64-minimal cloud-init[692]: Cloud-init v. 24.4-0ubuntu1~24.04.2 running 'init-local' at Thu, 13 Feb 2025 10:56:27 +0000. Up 6.26 seconds.
Feb 13 10:56:27 Ubuntu-2404-noble-64-minimal cloud-init[692]: 2025-02-13 10:56:27,379 - url_helper.py[WARNING]: Exception(s) [UrlError("HTTPConnectionPool(host='169.254.169.254', port=80): Max retries exceeded with url: /hetzner/v1/metadata/instance-id (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x74bca7902510>: Failed to establish a new connection: [Errno 101] Network is unreachable'))")] during request to http://169.254.169.254/hetzner/v1/metadata/instance-id, raising last exception
Feb 13 10:56:27 Ubuntu-2404-noble-64-minimal cloud-init[692]: 2025-02-13 10:56:27,380 - url_helper.py[ERROR]: Timed out, no response from urls: ['http://169.254.169.254/hetzner/v1/metadata/instance-id']
Feb 13 10:56:27 Ubuntu-2404-noble-64-minimal dhcpcd[696]: dhcpcd-10.0.6 starting
Feb 13 10:56:27 Ubuntu-2404-noble-64-minimal kernel: 8021q: 802.1Q VLAN Support v1.8
Feb 13 10:56:27 Ubuntu-2404-noble-64-minimal kernel: 8021q: adding VLAN 0 to HW filter on device enp7s0
Feb 13 10:56:27 Ubuntu-2404-noble-64-minimal dhcpcd[699]: DUID 00:01:00:01:2f:40:8c:5b:86:00:00:d7:bd:36
Feb 13 10:56:27 Ubuntu-2404-noble-64-minimal dhcpcd[699]: enp7s0: IAID 00:d7:bd:36
Feb 13 10:56:27 Ubuntu-2404-noble-64-minimal dhcpcd[699]: enp7s0: soliciting a DHCP lease
Feb 13 10:56:28 Ubuntu-2404-noble-64-minimal dhcpcd[699]: enp7s0: offered 10.0.1.2 from 10.0.0.1
Feb 13 10:56:28 Ubuntu-2404-noble-64-minimal dhcpcd[699]: enp7s0: leased 10.0.1.2 for 86400 seconds
Feb 13 10:56:28 Ubuntu-2404-noble-64-minimal dhcpcd[699]: enp7s0: adding host route to 10.0.0.1
Feb 13 10:56:28 Ubuntu-2404-noble-64-minimal dhcpcd[699]: enp7s0: adding route to 10.0.0.0/16 via 10.0.0.1
Feb 13 10:56:28 Ubuntu-2404-noble-64-minimal dhcpcd[699]: enp7s0: adding host route to 169.254.169.254 via 10.0.0.1
Feb 13 10:56:28 Ubuntu-2404-noble-64-minimal dhcpcd[699]: control command: dhcpcd --dumplease --ipv4only enp7s0
Feb 13 10:56:28 host systemd-resolved[605]: System hostname changed to 'host'.
Feb 13 10:56:28 host systemd[1]: netplan-ovs-cleanup.service - OpenVSwitch configuration for cleanup was skipped because of an unmet condition check (ConditionFileIsExecutable=/usr/bin/ovs-vsctl).
Feb 13 10:56:28 host systemd[1]: Reloading requested from client PID 727 ('systemctl') (unit cloud-init-local.service)...
Feb 13 10:56:28 host systemd[1]: Reloading...
Feb 13 10:56:28 host systemd[1]: Configuration file /run/systemd/system/netplan-ovs-cleanup.service is marked world-inaccessible. This has no effect as configuration data is accessible via APIs without restrictions. Proceeding anyway.
Feb 13 10:56:28 host systemd[1]: Reloading finished in 298 ms.
Feb 13 10:56:28 host systemd[1]: Finished cloud-init-local.service - Cloud-init: Local Stage (pre-network).
Feb 13 10:56:28 host systemd[1]: Reached target network-pre.target - Preparation for Network.
Feb 13 10:56:28 host systemd[1]: Starting systemd-networkd.service - Network Configuration...
Feb 13 10:56:29 host systemd-networkd[763]: lo: Link UP
Feb 13 10:56:29 host systemd-networkd[763]: lo: Gained carrier
Feb 13 10:56:29 host systemd-networkd[763]: Enumeration completed
Feb 13 10:56:29 host systemd-networkd[763]: enp7s0: Link UP
Feb 13 10:56:29 host systemd-networkd[763]: enp7s0: Gained carrier
Feb 13 10:56:29 host systemd[1]: Started systemd-networkd.service - Network Configuration.
Feb 13 10:56:29 host systemd[1]: Reached target network.target - Network.
Feb 13 10:56:29 host systemd-timesyncd[616]: Network configuration changed, trying to establish connection.
Feb 13 10:56:29 host systemd-networkd[763]: enp7s0: Link DOWN
Feb 13 10:56:29 host systemd-networkd[763]: enp7s0: Lost carrier
Feb 13 10:56:29 host systemd[1]: Starting systemd-networkd-wait-online.service - Wait for Network to be Configured...
Feb 13 10:56:56 host systemd[1]: systemd-fsckd.service: Deactivated successfully.
Feb 13 10:58:29 host systemd-networkd-wait-online[765]: Timeout occurred while waiting for network connectivity.
Feb 13 10:58:29 host systemd[1]: systemd-networkd-wait-online.service: Main process exited, code=exited, status=1/FAILURE
Feb 13 10:58:29 host systemd[1]: systemd-networkd-wait-online.service: Failed with result 'exit-code'.
Feb 13 10:58:29 host systemd[1]: Failed to start systemd-networkd-wait-online.service - Wait for Network to be Configured.
Feb 13 10:58:29 host systemd[1]: Starting cloud-init.service - Cloud-init: Network Stage...
Feb 13 10:58:29 host cloud-init[772]: Cloud-init v. 24.4-0ubuntu1~24.04.2 running 'init' at Thu, 13 Feb 2025 10:58:29 +0000. Up 128.65 seconds.
Feb 13 10:58:29 host systemd[1]: Starting systemd-networkd-wait-online.service - Wait for Network to be Configured...
Feb 13 11:00:29 host systemd-networkd-wait-online[775]: Timeout occurred while waiting for network connectivity.
Feb 13 11:00:29 host systemd[1]: systemd-networkd-wait-online.service: Main process exited, code=exited, status=1/FAILURE
Feb 13 11:00:29 host systemd[1]: systemd-networkd-wait-online.service: Failed with result 'exit-code'.
Feb 13 11:00:29 host systemd[1]: Failed to start systemd-networkd-wait-online.service - Wait for Network to be Configured.
Feb 13 11:00:29 host cloud-init[772]: 2025-02-13 11:00:29,999 - ubuntu.py[ERROR]: Failed to wait for network: Unexpected error while running command.
Feb 13 11:00:29 host cloud-init[772]: Command: ['systemctl', 'start', 'systemd-networkd-wait-online.service']
Feb 13 11:00:29 host cloud-init[772]: Exit code: 1
Feb 13 11:00:29 host cloud-init[772]: Reason: -
Feb 13 11:00:29 host cloud-init[772]: Stdout:
Feb 13 11:00:29 host cloud-init[772]: Stderr: Job for systemd-networkd-wait-online.service failed because the control process exited with error code.
Feb 13 11:00:29 host cloud-init[772]:         See "systemctl status systemd-networkd-wait-online.service" and "journalctl -xeu systemd-networkd-wait-online.service" for details.
Feb 13 11:00:30 host cloud-init[772]: ci-info: +++++++++++++++++++++++++++Net device info++++++++++++++++++++++++++++
Feb 13 11:00:30 host cloud-init[772]: ci-info: +--------+-------+-----------+-----------+-------+-------------------+
Feb 13 11:00:30 host cloud-init[772]: ci-info: | Device |   Up  |  Address  |    Mask   | Scope |     Hw-Address    |
Feb 13 11:00:30 host cloud-init[772]: ci-info: +--------+-------+-----------+-----------+-------+-------------------+
Feb 13 11:00:30 host cloud-init[772]: ci-info: | enp7s0 | False |     .     |     .     |   .   | 86:00:00:d7:bd:36 |
Feb 13 11:00:30 host cloud-init[772]: ci-info: |   lo   |  True | 127.0.0.1 | 255.0.0.0 |  host |         .         |
Feb 13 11:00:30 host cloud-init[772]: ci-info: |   lo   |  True |  ::1/128  |     .     |  host |         .         |
Feb 13 11:00:30 host cloud-init[772]: ci-info: +--------+-------+-----------+-----------+-------+-------------------+
Feb 13 11:00:30 host cloud-init[772]: ci-info: +++++++++++++++++++Route IPv6 info+++++++++++++++++++
Feb 13 11:00:30 host cloud-init[772]: ci-info: +-------+-------------+---------+-----------+-------+
Feb 13 11:00:30 host cloud-init[772]: ci-info: | Route | Destination | Gateway | Interface | Flags |
Feb 13 11:00:30 host cloud-init[772]: ci-info: +-------+-------------+---------+-----------+-------+
Feb 13 11:00:30 host cloud-init[772]: ci-info: +-------+-------------+---------+-----------+-------+

We can see the 2 x 2 minutes timeouts.

While the server are created as fast as expected, cloud init does wait 4 additional minutes.

I'll report this to the relevant team, and report back here.

@jooola
Copy link
Member

jooola commented Feb 13, 2025

We believe that the source of the problem is a race between hc-utils, and cloud-init.

We have an idea how to implement a fix, but it is not an easy one and will take some time to be fixed.

I will try to come up with a ready to use workaround.

EDIT: There is not easy workaround (directly from the user data) as far as I am aware. One may create a snapshot of the server after changing the network configuration on the disk (e.g setting https://cloudinit.readthedocs.io/en/latest/reference/network-config-format-v2.html#optional-bool or maybe disabling the hc-net-scan.service service) and use that image instead.

@jooola jooola changed the title [Bug]: Servers created with public_net disabled are delayed by 4 minutes at startup because "Failed to wait for network: Unexpected error while running command." [Bug]: Servers created with public_net disabled are delayed by 4 minutes on ubuntu 24.04 Feb 13, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants