Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

2025-03-03 ZeroTier - health checking - alternative proposal #38

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

Paraphraser
Copy link
Contributor

This PR follows on from the extensive discussion associated with #37.

Never before have I even contemplated submitting a PR covering the same ground as an existing open PR. However, on this occasion I thought it might be useful to have a concrete proposal to compare and contrast with #37.

I sincerely hope that laying this on the (virtual) table and then minimising further interaction might help us converge on a solution.


Changes:

  • docker-compose.yml and docker-compose-router.yml:

    • replaces deprecated version statement with ---.

    • adds example environment variables.

  • Dockerfile

  • Dockerfile.router

    • removes tzdata (moved to Dockerfile).
  • entrypoint-router.sh:

    • code for first launch auto join of listed networks expanded to include additional help material.
  • entrypoint.sh:

    • "first launch" auto join of listed networks (code copied from entrypoint-router.sh, as modified per above).

    • "self repair" of permissions in persistent store (code copied from entrypoint-router.sh).

    • adds launch-time message to make it clear that the client is launching (complements messages in entrypoint-router.sh).

    • abstracts some common strings to environment variables (opportunistic change).

  • README.md:

    • updates examples.

    • describes new environment variables (including move of ZEROTIER_ONE_NETWORK_IDS from README-router.md.

    • documents health-checking.

  • README-router.md

    • updates examples.
    • explains relationship of router and client.

Added:

I gave serious consideration to the code for synchronising networks in the entry point scripts. The idea is quite attractive. It is safe to automate joins in a "clean slate" situation. However, a leave followed by a join is not guaranteed to be idempotent. That's because the leave destroys the network-specific configuration options (allowManaged, allowGlobal, allowDefault, allowDNS).

On balance I think it's better left to users to send explicit leave commands via the CLI and take responsibility for restoring lost configuration options on any subsequent join.

I will post the results of testing this PR separately.

This PR follows on from the extensive discussion associated with zyclonite#37.

Never before have I even *contemplated* submitting a PR covering the
same ground as an existing open PR. However, on this occasion I thought
it might be useful to have a concrete proposal to compare and contrast
with zyclonite#37.

I sincerely hope that laying this on the (virtual) table and then
minimising further interaction *might* help us converge on a solution.

<hr>

Changes:

* `docker-compose.yml` and `docker-compose-router.yml`:

	- replaces deprecated `version` statement with `---`.

	- adds example environment variables.

* `Dockerfile`

	- corrects case of "as" to "AS" (silences build warning).

	- adds and configures `healthcheck.sh` (as per zyclonite#37).

	- includes `tzdata` package (moved from `Dockerfile.router`) so
	  messages have local timestamps.

* `Dockerfile.router`

	- removes `tzdata` (moved to `Dockerfile`).

* `entrypoint-router.sh`:

	- code for first launch auto join of listed networks expanded to
	  include additional help material.

* `entrypoint.sh`:

	- "first launch" auto join of listed networks (code copied from
	  `entrypoint-router.sh`, as modified per above).

	- "self repair" of permissions in persistent store (code copied from
	  `entrypoint-router.sh`).

	- adds launch-time message to make it clear that the client is
	  launching (complements messages in `entrypoint-router.sh`).

	- abstracts some common strings to environment variables
	  (opportunistic change).

* `README.md`:

	- updates examples.

	- describes new environment variables (including move of
	  `ZEROTIER_ONE_NETWORK_IDS` from `README-router.md`.

	- documents health-checking.

* `README-router.md`

	- updates examples.
	- explains relationship of router and client.

Added:

* `healthcheck.sh`, based on original proposal in zyclonite#37 and subsequent
  suggestions for modification by me.

I gave serious consideration to the code for synchronising networks in
the entry point scripts. The idea is quite attractive. It is safe to
automate joins in a "clean slate" situation. However, a *leave* followed
by a *join* is not guaranteed to be idempotent. That's because the
*leave* destroys the network-specific configuration options
(`allowManaged`, `allowGlobal`, `allowDefault`, `allowDNS`).

On balance I think it's better left to users to send explicit *leave*
commands via the CLI and take responsibility for restoring lost
configuration options on any subsequent *join*.

I will post the results of testing this PR separately.

Signed-off-by: Phill Kelley <[email protected]>
@Paraphraser
Copy link
Contributor Author

Testing

This post includes a series of tests I have run against containers built using the changes proposed in this PR.

reference service definitions

client

  zerotier:
    container_name: zerotier
    image: "zyclonite/zerotier:local"
    restart: unless-stopped
    network_mode: host
    environment:
      - TZ=${TZ:-Etc/UTC}
      # - ZEROTIER_ONE_NETWORK_IDS=9999888877776666
      # - ZEROTIER_ONE_CHK_SPECIFIC_NETWORKS=9999888877776666
      # - ZEROTIER_ONE_CHK_MIN_ROUTES_FOR_HEALTH=1
    volumes:
      - ./volumes/zerotier-one:/var/lib/zerotier-one
    devices:
      - "/dev/net/tun:/dev/net/tun"
    cap_add:
      - NET_ADMIN
      - SYS_ADMIN

router

  zerotier-router:
    container_name: zerotier
    image: "zyclonite/zerotier-router:local"
    restart: unless-stopped
    network_mode: host
    environment:
      - TZ=${TZ:-Etc/UTC}
      - ZEROTIER_ONE_LOCAL_PHYS=ens18
      - ZEROTIER_ONE_USE_IPTABLES_NFT=true
      - ZEROTIER_ONE_GATEWAY_MODE=both
      # - ZEROTIER_ONE_NETWORK_IDS=9999888877776666
      # - ZEROTIER_ONE_CHK_SPECIFIC_NETWORKS=9999888877776666
      # - ZEROTIER_ONE_CHK_MIN_ROUTES_FOR_HEALTH=1
    volumes:
      - ./volumes/zerotier-one:/var/lib/zerotier-one
    devices:
      - "/dev/net/tun:/dev/net/tun"
    cap_add:
      - NET_ADMIN
      - SYS_ADMIN
      - NET_RAW

client testing

clean slate - null configuration

configuration

environment variable defined
ZEROTIER_ONE_NETWORK_IDS no
ZEROTIER_ONE_LOCAL_PHYS no
ZEROTIER_ONE_USE_IPTABLES_NFT no
ZEROTIER_ONE_GATEWAY_MODE no
ZEROTIER_ONE_CHK_MIN_ROUTES_FOR_HEALTH no1
ZEROTIER_ONE_CHK_SPECIFIC_NETWORKS no
  1. Defaults to "1". Takes effect because specific network check is undefined.

setup

$ ls ./volumes/zerotier-one
ls: cannot access './volumes/zerotier-one': No such file or directory

$ docker-compose up -d zerotier
[+] Running 1/1
 ✔ Container zerotier  Started                                                                                         0.2s 

$ docker logs zerotier
Sun Mar  2 23:34:37 AEDT 2025 - assuming container first run.
 ZEROTIER_ONE_NETWORK_IDS not set. You will need to join
 networks using zerotier-cli, and then approve this
 host in ZeroTier Central.
Sun Mar  2 23:34:37 AEDT 2025 - launching ZeroTier-One in client mode
Starting Control Plane...
Starting V6 Control Plane...

$ docker exec zerotier env | grep '^ZEROTIER_ONE_'

Interpretation:

  1. Show persistent store does not exist.
  2. Container launched.
  3. Log confirms client and reminds user to join a network and approve this host.
  4. No variables defined.

test

$ docker ps
CONTAINER ID   IMAGE                      COMMAND              CREATED         STATUS                     PORTS     NAMES
6f51e4e8a18c   zyclonite/zerotier:local   "entrypoint.sh -U"   2 minutes ago   Up 2 minutes (unhealthy)             zerotier

$ docker exec zerotier zerotier-cli listnetworks
200 listnetworks <nwid> <name> <mac> <status> <type> <dev> <ZT assigned ips>

$ docker exec zerotier zerotier-cli join 9999888877776666
200 join OK

$ docker exec zerotier zerotier-cli listnetworks
200 listnetworks <nwid> <name> <mac> <status> <type> <dev> <ZT assigned ips>
200 listnetworks 9999888877776666  22:ea:af:dc:d7:94 REQUESTING_CONFIGURATION PRIVATE ztr2qsmswx -

$ docker exec zerotier zerotier-cli get 9999888877776666 status
REQUESTING_CONFIGURATION

$ ip r | grep 'dev zt' | grep -cv via
0

$ echo 'Go to ZeroTier Central and approve this client'
Go to ZeroTier Central and approve this client

$ docker ps
CONTAINER ID   IMAGE                      COMMAND              CREATED         STATUS                   PORTS     NAMES
6f51e4e8a18c   zyclonite/zerotier:local   "entrypoint.sh -U"   3 minutes ago   Up 3 minutes (healthy)             zerotier

$ docker exec zerotier zerotier-cli listnetworks
200 listnetworks <nwid> <name> <mac> <status> <type> <dev> <ZT assigned ips>
200 listnetworks 9999888877776666 My_ZeroTier 22:ea:af:dc:d7:94 OK PRIVATE ztr2qsmswx 10.244.63.140/16

$ docker exec zerotier zerotier-cli get 9999888877776666 status
OK

$ ip r | grep 'dev zt' | grep -cv via
1

$ docker-compose down zerotier
[+] Running 1/1
 ✔ Container zerotier  Removed                                                                                         2.6s 

Interpretation:

  1. Show container unhealthy (waiting for a join). Expected result.
  2. Show container has not joined any networks.
  3. Join a network.
  4. Show listnetworks status is REQUESTING_CONFIGURATION.
  5. Show status of joined network is REQUESTING_CONFIGURATION.
  6. Show no routes associated with ZeroTier interface (because interface is not constructed until client is approved).
  7. Approve client (via browser).
  8. Show container goes healthy. Expected result.
  9. Show listnetworks status is OK.
  10. Show status of joined network is OK.
  11. Show direct route to ZeroTier interface has been created.
  12. End of test.

minimum route checking

configuration

environment variable defined
ZEROTIER_ONE_NETWORK_IDS yes
ZEROTIER_ONE_CHK_MIN_ROUTES_FOR_HEALTH no1
ZEROTIER_ONE_CHK_SPECIFIC_NETWORKS no
  1. Defaults to "1". Takes effect because specific network check is undefined.

setup

$ docker-compose up -d zerotier
[+] Running 1/1
 ✔ Container zerotier  Started                                                                                                                             0.2s 

$ docker ps
CONTAINER ID   IMAGE                      COMMAND              CREATED         STATUS                   PORTS     NAMES
656c6c75c8f3   zyclonite/zerotier:local   "entrypoint.sh -U"   5 seconds ago   Up 5 seconds (healthy)             zerotier

$ docker logs zerotier
Wed Feb 26 14:20:51 AEDT 2025 - launching ZeroTier-One in client mode
Starting Control Plane...
Starting V6 Control Plane...

$ docker exec zerotier env | grep '^ZEROTIER_ONE_'
ZEROTIER_ONE_NETWORK_IDS=9999888877776666

Interpretation:

  1. Container launched.
  2. Container healthy. Expected result.
  3. Log confirms client.
  4. Only one variable defined.

test

$ docker exec zerotier zerotier-cli listnetworks
200 listnetworks <nwid> <name> <mac> <status> <type> <dev> <ZT assigned ips>
200 listnetworks 9999888877776666 My_ZeroTier 22:43:c0:8e:8d:1c OK PRIVATE ztr2qsmswx 10.244.211.241/16

$ docker exec zerotier zerotier-cli leave 9999888877776666
200 leave OK

$ docker ps
CONTAINER ID   IMAGE                      COMMAND              CREATED         STATUS                     PORTS     NAMES
656c6c75c8f3   zyclonite/zerotier:local   "entrypoint.sh -U"   2 minutes ago   Up 2 minutes (unhealthy)             zerotier

$ docker exec zerotier zerotier-cli join 9999888877776666
200 join OK

$ docker ps
CONTAINER ID   IMAGE                      COMMAND              CREATED         STATUS                   PORTS     NAMES
656c6c75c8f3   zyclonite/zerotier:local   "entrypoint.sh -U"   3 minutes ago   Up 3 minutes (healthy)             zerotier

$ sudo nmcli conn down ztr2qsmswx
Connection 'ztr2qsmswx' successfully deactivated (D-Bus active path: /org/freedesktop/NetworkManager/ActiveConnection/86)

$ docker ps
CONTAINER ID   IMAGE                      COMMAND              CREATED         STATUS                     PORTS     NAMES
656c6c75c8f3   zyclonite/zerotier:local   "entrypoint.sh -U"   5 minutes ago   Up 5 minutes (unhealthy)             zerotier

$ docker exec zerotier zerotier-cli get 9999888877776666 status
OK

$ ip r | grep 'dev zt' | grep -cv via
0

$ docker-compose down zerotier
[+] Running 1/1
 ✔ Container zerotier  Removed                                                                                                                             1.4s 

Interpretation:

  1. Show container has joined test network.
  2. Leave that network.
  3. Show container goes unhealthy. Expected result.
  4. Rejoin the network.
  5. Show container goes healthy. Expected result.
  6. Destroy the network interface.
  7. Show container goes unhealthy. Expected result.
  8. Show reason is not because the ZeroTier network has gone away. Expected result.
  9. Show reason is because the direct route has gone away (count is zero). Expected result.
  10. End of test.

specific network checking

configuration

environment variable defined
ZEROTIER_ONE_NETWORK_IDS yes
ZEROTIER_ONE_CHK_MIN_ROUTES_FOR_HEALTH no
ZEROTIER_ONE_CHK_SPECIFIC_NETWORKS yes1
  1. Takes precedence over default of 1 for minimum route checking.

setup

$ docker-compose up -d zerotier
[+] Running 1/1
 ✔ Container zerotier  Started                                                                                                                                     0.2s 

$ docker ps
CONTAINER ID   IMAGE                      COMMAND              CREATED         STATUS                   PORTS     NAMES
6d1e7c9d155a   zyclonite/zerotier:local   "entrypoint.sh -U"   5 seconds ago   Up 5 seconds (healthy)             zerotier

$ docker logs zerotier
Wed Feb 26 14:44:12 AEDT 2025 - launching ZeroTier-One in client mode
Starting Control Plane...
Starting V6 Control Plane...

$ docker exec zerotier env | grep '^ZEROTIER_ONE_'
ZEROTIER_ONE_NETWORK_IDS=9999888877776666
ZEROTIER_ONE_CHK_SPECIFIC_NETWORKS=9999888877776666

Interpretation:

  1. Container launched.
  2. Container healthy. Expected result.
  3. Log confirms client.
  4. Two variables defined.

test

$ docker exec zerotier zerotier-cli listnetworks
200 listnetworks <nwid> <name> <mac> <status> <type> <dev> <ZT assigned ips>
200 listnetworks 9999888877776666 My_ZeroTier 22:43:c0:8e:8d:1c OK PRIVATE ztr2qsmswx 10.244.211.241/16

$ docker exec zerotier zerotier-cli leave 9999888877776666
200 leave OK

$ docker ps
CONTAINER ID   IMAGE                      COMMAND              CREATED         STATUS                     PORTS     NAMES
6d1e7c9d155a   zyclonite/zerotier:local   "entrypoint.sh -U"   2 minutes ago   Up 2 minutes (unhealthy)             zerotier

$ docker exec zerotier zerotier-cli join 9999888877776666
200 join OK

$ docker ps
CONTAINER ID   IMAGE                      COMMAND              CREATED         STATUS                   PORTS     NAMES
6d1e7c9d155a   zyclonite/zerotier:local   "entrypoint.sh -U"   3 minutes ago   Up 3 minutes (healthy)             zerotier

$ sudo nmcli conn down ztr2qsmswx
Connection 'ztr2qsmswx' successfully deactivated (D-Bus active path: /org/freedesktop/NetworkManager/ActiveConnection/88)

$ docker ps
CONTAINER ID   IMAGE                      COMMAND              CREATED         STATUS                     PORTS     NAMES
6d1e7c9d155a   zyclonite/zerotier:local   "entrypoint.sh -U"   5 minutes ago   Up 5 minutes (unhealthy)             zerotier

$ docker exec zerotier zerotier-cli get 9999888877776666 status
OK

$ ip r | grep 'dev zt' | grep -cv via
0

$ docker-compose down zerotier
[+] Running 1/1
 ✔ Container zerotier  Removed                                                                                                                                     1.3s 

Interpretation:

  1. Show container has joined test network.
  2. Leave that network.
  3. Show container goes unhealthy. Expected result.
  4. Rejoin the network.
  5. Show container goes healthy. Expected result.
  6. Destroy the network interface.
  7. Show container goes unhealthy. Expected result.
  8. Show reason is not because the ZeroTier network has gone away. Expected result.
  9. Show reason is because the direct route has gone away (count is zero). Expected result.
  10. End of test.

router testing

minimum route checking

configuration

environment variable defined
ZEROTIER_ONE_NETWORK_IDS yes
ZEROTIER_ONE_LOCAL_PHYS yes
ZEROTIER_ONE_USE_IPTABLES_NFT yes
ZEROTIER_ONE_GATEWAY_MODE yes
ZEROTIER_ONE_CHK_MIN_ROUTES_FOR_HEALTH no1
ZEROTIER_ONE_CHK_SPECIFIC_NETWORKS no
  1. Defaults to "1". Takes effect because specific network check is undefined.

setup

$ docker-compose up -d zerotier-router
[+] Running 1/1
 ✔ Container zerotier  Started                                                                                                                                     0.2s 

$ docker ps
CONTAINER ID   IMAGE                             COMMAND                  CREATED         STATUS                   PORTS     NAMES
f23bcc78f936   zyclonite/zerotier-router:local   "entrypoint-router.s…"   6 seconds ago   Up 5 seconds (healthy)             zerotier

$ docker logs zerotier
Wed Feb 26 15:02:36 AEDT 2025 - launching ZeroTier-One in routing mode
adding iptables-nft rules for bi-directional traffic (local interfaces ens18 to/from ZeroTier)
Wed Feb 26 15:02:36 AEDT 2025 - ZeroTier daemon is running as process 17
Starting Control Plane...
Starting V6 Control Plane...

$ docker exec zerotier env | grep '^ZEROTIER_ONE_'
ZEROTIER_ONE_NETWORK_IDS=9999888877776666
ZEROTIER_ONE_LOCAL_PHYS=ens18
ZEROTIER_ONE_USE_IPTABLES_NFT=true
ZEROTIER_ONE_GATEWAY_MODE=both

Interpretation:

  1. Container launched.
  2. Container healthy. Expected result.
  3. Log confirms router.
  4. Four variables defined.

test

$ docker exec zerotier zerotier-cli listnetworks
200 listnetworks <nwid> <name> <mac> <status> <type> <dev> <ZT assigned ips>
200 listnetworks 9999888877776666 My_ZeroTier 22:43:c0:8e:8d:1c OK PRIVATE ztr2qsmswx 10.244.211.241/16

$ docker exec zerotier zerotier-cli leave 9999888877776666
200 leave OK

$ docker ps
CONTAINER ID   IMAGE                             COMMAND                  CREATED         STATUS                     PORTS     NAMES
f23bcc78f936   zyclonite/zerotier-router:local   "entrypoint-router.s…"   2 minutes ago   Up 2 minutes (unhealthy)             zerotier

$ docker exec zerotier zerotier-cli join 9999888877776666
200 join OK

$ docker ps
CONTAINER ID   IMAGE                             COMMAND                  CREATED         STATUS                   PORTS     NAMES
f23bcc78f936   zyclonite/zerotier-router:local   "entrypoint-router.s…"   3 minutes ago   Up 3 minutes (healthy)             zerotier

$ sudo nmcli conn down ztr2qsmswx
Connection 'ztr2qsmswx' successfully deactivated (D-Bus active path: /org/freedesktop/NetworkManager/ActiveConnection/91)

$ docker ps
CONTAINER ID   IMAGE                             COMMAND                  CREATED         STATUS                     PORTS     NAMES
f23bcc78f936   zyclonite/zerotier-router:local   "entrypoint-router.s…"   5 minutes ago   Up 5 minutes (unhealthy)             zerotier

$ docker exec zerotier zerotier-cli get 9999888877776666 status
OK

$ ip r | grep 'dev zt' | grep -cv via
0

$ docker-compose down zerotier-router
[+] Running 1/1
 ✔ Container zerotier  Removed                                                                                                                                     1.4s 

Interpretation:

  1. Show container has joined test network.
  2. Leave that network.
  3. Show container goes unhealthy. Expected result.
  4. Rejoin the network.
  5. Show container goes healthy. Expected result.
  6. Destroy the network interface.
  7. Show container goes unhealthy. Expected result.
  8. Show reason is not because the ZeroTier network has gone away. Expected result.
  9. Show reason is because the direct route has gone away (count is zero). Expected result.
  10. End of test.

specific network checking

configuration

environment variable defined
ZEROTIER_ONE_NETWORK_IDS yes
ZEROTIER_ONE_LOCAL_PHYS yes
ZEROTIER_ONE_USE_IPTABLES_NFT yes
ZEROTIER_ONE_GATEWAY_MODE yes
ZEROTIER_ONE_CHK_MIN_ROUTES_FOR_HEALTH no
ZEROTIER_ONE_CHK_SPECIFIC_NETWORKS yes1
  1. Takes precedence over default of 1 for minimum route checking.

setup

$ docker-compose up -d zerotier-router
[+] Running 1/1
 ✔ Container zerotier  Started                                                                                                                                     0.2s 

$ docker ps
CONTAINER ID   IMAGE                             COMMAND                  CREATED         STATUS                   PORTS     NAMES
ea0f81101029   zyclonite/zerotier-router:local   "entrypoint-router.s…"   5 seconds ago   Up 5 seconds (healthy)             zerotier

$ docker logs zerotier
Wed Feb 26 15:26:08 AEDT 2025 - launching ZeroTier-One in routing mode
adding iptables-nft rules for bi-directional traffic (local interfaces ens18 to/from ZeroTier)
Wed Feb 26 15:26:08 AEDT 2025 - ZeroTier daemon is running as process 17
Starting Control Plane...
Starting V6 Control Plane...

$ docker exec zerotier env | grep '^ZEROTIER_ONE_'
ZEROTIER_ONE_LOCAL_PHYS=ens18
ZEROTIER_ONE_USE_IPTABLES_NFT=true
ZEROTIER_ONE_GATEWAY_MODE=both
ZEROTIER_ONE_CHK_SPECIFIC_NETWORKS=9999888877776666
ZEROTIER_ONE_NETWORK_IDS=9999888877776666

Interpretation:

  1. Container launched.
  2. Container healthy. Expected result.
  3. Log confirms router.
  4. Five variables defined.

test

$ docker exec zerotier zerotier-cli listnetworks
200 listnetworks <nwid> <name> <mac> <status> <type> <dev> <ZT assigned ips>
200 listnetworks 9999888877776666 My_ZeroTier 22:43:c0:8e:8d:1c OK PRIVATE ztr2qsmswx 10.244.211.241/16

$ docker exec zerotier zerotier-cli leave 9999888877776666
200 leave OK

$ docker ps
CONTAINER ID   IMAGE                             COMMAND                  CREATED         STATUS                     PORTS     NAMES
ea0f81101029   zyclonite/zerotier-router:local   "entrypoint-router.s…"   2 minutes ago   Up 2 minutes (unhealthy)             zerotier

$ docker exec zerotier zerotier-cli join 9999888877776666
200 join OK

$ docker ps
CONTAINER ID   IMAGE                             COMMAND                  CREATED         STATUS                   PORTS     NAMES
ea0f81101029   zyclonite/zerotier-router:local   "entrypoint-router.s…"   3 minutes ago   Up 3 minutes (healthy)             zerotier

$ sudo nmcli conn down ztr2qsmswx
Connection 'ztr2qsmswx' successfully deactivated (D-Bus active path: /org/freedesktop/NetworkManager/ActiveConnection/95)

$ docker ps
CONTAINER ID   IMAGE                             COMMAND                  CREATED         STATUS                     PORTS     NAMES
ea0f81101029   zyclonite/zerotier-router:local   "entrypoint-router.s…"   5 minutes ago   Up 5 minutes (unhealthy)             zerotier

$ docker exec zerotier zerotier-cli get 9999888877776666 status
OK

$ ip r | grep 'dev zt' |grep -cv via
0

+ docker-compose down zerotier-router
[+] Running 1/1
 ✔ Container zerotier  Removed                                                                                                                                     1.5s 

Interpretation:

  1. Show container has joined test network.
  2. Leave that network.
  3. Show container goes unhealthy. Expected result.
  4. Rejoin the network.
  5. Show container goes healthy. Expected result.
  6. Destroy the network interface.
  7. Show container goes unhealthy. Expected result.
  8. Show reason is not because the ZeroTier network has gone away. Expected result.
  9. Show reason is because the direct route has gone away (count is zero). Expected result.
  10. End of test.

@Paraphraser
Copy link
Contributor Author

Paraphraser commented Mar 3, 2025

Hmmm. That's a bit odd:

Screenshot 2025-03-03 at 14 05 47

Seems to be moaning about the --start-interval flag but that's definitely defined in the Dockerfile documentation.

I don't get this error when I do a local build. Any ideas?

Edit 1 - maybe the second sentence in the doco explains it:

start interval is the time between health checks during the start period. This option requires Docker Engine version 25.0 or later.

Edit 2 - it seems to be a buildah thing. I was trying with both docker buildx build and vanilla docker build. They don't show this error but buildah does.

@gb-123-git
Copy link

@Paraphraser @zyclonite

My humble question for a small use case scenario :
What if the user wants to check ALL networks he has joined (which can change dynamically) ?
How do we check that in this proposal ?

@Paraphraser
Copy link
Contributor Author

Testing

What if the user wants to check ALL networks he has joined (which can change dynamically) ?

Use either a specific networks test or a minimum route count test.

reference service definitions

  zerotier:
    container_name: zerotier
    image: "zyclonite/zerotier:local"
    restart: unless-stopped
    network_mode: host
    environment:
      - TZ=${TZ:-Etc/UTC}
      # - ZEROTIER_ONE_NETWORK_IDS=9999888877776666 9999888877775555
      # - ZEROTIER_ONE_CHK_SPECIFIC_NETWORKS=9999888877776666 9999888877775555
      # - ZEROTIER_ONE_CHK_MIN_ROUTES_FOR_HEALTH=2
    volumes:
      - ./volumes/zerotier-one:/var/lib/zerotier-one
    devices:
      - "/dev/net/tun:/dev/net/tun"
    cap_add:
      - NET_ADMIN
      - SYS_ADMIN

client testing

Router has same health check so it will have same behaviour.

check using specific networks

configuration

environment variable defined
ZEROTIER_ONE_NETWORK_IDS no
ZEROTIER_ONE_CHK_MIN_ROUTES_FOR_HEALTH no
ZEROTIER_ONE_CHK_SPECIFIC_NETWORKS yes1
  1. Two network IDs provided.

setup

$ docker-compose up -d zerotier
[+] Running 1/1
 ✔ Container zerotier  Started                                                                                         0.2s 

$ docker logs zerotier
Mon Mar  3 23:14:20 AEDT 2025 - launching ZeroTier-One in client mode
Starting Control Plane...
Starting V6 Control Plane...

$ docker exec zerotier env | grep '^ZEROTIER_ONE_'
ZEROTIER_ONE_CHK_SPECIFIC_NETWORKS=9999888877776666 9999888877775555

$ docker ps
CONTAINER ID   IMAGE                      COMMAND              CREATED         STATUS                   PORTS     NAMES
171d328beef2   zyclonite/zerotier:local   "entrypoint.sh -U"   5 seconds ago   Up 5 seconds (healthy)             zerotier

$ docker exec zerotier zerotier-cli listnetworks
200 listnetworks <nwid> <name> <mac> <status> <type> <dev> <ZT assigned ips>
200 listnetworks 9999888877776666 My_ZeroTier 22:43:c0:8e:8d:1c OK PRIVATE ztr2qsmswx 10.244.211.241/16
200 listnetworks 9999888877775555 Test 5e:7e:1c:52:e3:71 OK PRIVATE ztc3qzoglu 10.242.211.241/16

$ ip r | grep 'dev zt' | grep -cv via
2

Interpretation:

  1. Container launched.
  2. Log confirms client.
  3. One variable defined.
  4. Two networks joined.
  5. Two associated routes.

test

$ docker exec zerotier zerotier-cli leave 9999888877776666
200 leave OK

$ docker ps
CONTAINER ID   IMAGE                      COMMAND              CREATED         STATUS                     PORTS     NAMES
171d328beef2   zyclonite/zerotier:local   "entrypoint.sh -U"   2 minutes ago   Up 2 minutes (unhealthy)             zerotier

$ ip r | grep 'dev zt' | grep -cv via
1

$ docker exec zerotier zerotier-cli join 9999888877776666
200 join OK

$ docker ps
CONTAINER ID   IMAGE                      COMMAND              CREATED         STATUS                   PORTS     NAMES
171d328beef2   zyclonite/zerotier:local   "entrypoint.sh -U"   3 minutes ago   Up 3 minutes (healthy)             zerotier

$ ip r | grep 'dev zt' | grep -cv via
2

$ docker exec zerotier zerotier-cli leave 9999888877775555
200 leave OK

$ docker ps
CONTAINER ID   IMAGE                      COMMAND              CREATED         STATUS                     PORTS     NAMES
171d328beef2   zyclonite/zerotier:local   "entrypoint.sh -U"   5 minutes ago   Up 5 minutes (unhealthy)             zerotier

$ ip r | grep 'dev zt' | grep -cv via
1

$ docker exec zerotier zerotier-cli join 9999888877775555
200 join OK

$ docker ps
CONTAINER ID   IMAGE                      COMMAND              CREATED         STATUS                   PORTS     NAMES
171d328beef2   zyclonite/zerotier:local   "entrypoint.sh -U"   6 minutes ago   Up 6 minutes (healthy)             zerotier

$ ip r | grep 'dev zt' | grep -cv via
2

$ docker-compose down zerotier
[+] Running 1/1
 ✔ Container zerotier  Removed                                                                                         2.6s 

Interpretation:

  1. Leave first network.
  2. Show container goes unhealthy. Expected result.
  3. Show associated route count is 1. Expected result.
  4. Rejoin first network.
  5. Show container goes healthy. Expected result.
  6. Show associated route count is 2. Expected result.
  7. Leave second network.
  8. Show container goes unhealthy. Expected result.
  9. Show associated route count is 1. Expected result.
  10. Rejoin second network.
  11. Show container goes healthy. Expected result.
  12. Show associated route count is 2. Expected result.
  13. End of test.

check using minimum route count

configuration

environment variable defined
ZEROTIER_ONE_NETWORK_IDS no
ZEROTIER_ONE_CHK_MIN_ROUTES_FOR_HEALTH yes1
ZEROTIER_ONE_CHK_SPECIFIC_NETWORKS no
  1. Count of 2.

setup

$ docker-compose up -d zerotier
[+] Running 1/1
 ✔ Container zerotier  Started                                                                                         0.2s 

$ docker logs zerotier
Mon Mar  3 23:40:23 AEDT 2025 - launching ZeroTier-One in client mode
Starting Control Plane...
Starting V6 Control Plane...

$ docker exec zerotier env | grep '^ZEROTIER_ONE_'
ZEROTIER_ONE_CHK_MIN_ROUTES_FOR_HEALTH=2

$ docker ps
CONTAINER ID   IMAGE                      COMMAND              CREATED         STATUS                   PORTS     NAMES
1e04965f4e30   zyclonite/zerotier:local   "entrypoint.sh -U"   5 seconds ago   Up 5 seconds (healthy)             zerotier

$ docker exec zerotier zerotier-cli listnetworks
200 listnetworks <nwid> <name> <mac> <status> <type> <dev> <ZT assigned ips>
200 listnetworks 9999888877776666 My_ZeroTier 22:43:c0:8e:8d:1c OK PRIVATE ztr2qsmswx 10.244.211.241/16
200 listnetworks 9999888877775555 Test 5e:7e:1c:52:e3:71 OK PRIVATE ztc3qzoglu 10.242.211.241/16

$ ip r | grep 'dev zt' | grep -cv via
2

Interpretation:

  1. Container launched.
  2. Log confirms client.
  3. One variable defined.
  4. Two networks joined.
  5. Two associated routes.

test

$ docker exec zerotier zerotier-cli leave 9999888877776666
200 leave OK

$ docker ps
CONTAINER ID   IMAGE                      COMMAND              CREATED         STATUS                     PORTS     NAMES
1e04965f4e30   zyclonite/zerotier:local   "entrypoint.sh -U"   2 minutes ago   Up 2 minutes (unhealthy)             zerotier

$ ip r | grep 'dev zt' | grep -cv via
1

$ docker exec zerotier zerotier-cli join 9999888877776666
200 join OK

$ docker ps
CONTAINER ID   IMAGE                      COMMAND              CREATED         STATUS                   PORTS     NAMES
1e04965f4e30   zyclonite/zerotier:local   "entrypoint.sh -U"   3 minutes ago   Up 3 minutes (healthy)             zerotier

$ ip r | grep 'dev zt' | grep -cv via
2

$ docker exec zerotier zerotier-cli leave 9999888877775555
200 leave OK

$ docker ps
CONTAINER ID   IMAGE                      COMMAND              CREATED         STATUS                     PORTS     NAMES
1e04965f4e30   zyclonite/zerotier:local   "entrypoint.sh -U"   5 minutes ago   Up 5 minutes (unhealthy)             zerotier

$ ip r | grep 'dev zt' | grep -cv via
1

$ docker exec zerotier zerotier-cli join 9999888877775555
200 join OK

$ docker ps
CONTAINER ID   IMAGE                      COMMAND              CREATED         STATUS                   PORTS     NAMES
1e04965f4e30   zyclonite/zerotier:local   "entrypoint.sh -U"   6 minutes ago   Up 6 minutes (healthy)             zerotier

$ ip r | grep 'dev zt' | grep -cv via
2

$ docker-compose down zerotier
[+] Running 1/1
 ✔ Container zerotier  Removed                                                                                         2.7s 

Interpretation:

  1. Leave first network.
  2. Show container goes unhealthy. Expected result.
  3. Show associated route count is 1. Expected result.
  4. Rejoin first network.
  5. Show container goes healthy. Expected result.
  6. Show associated route count is 2. Expected result.
  7. Leave second network.
  8. Show container goes unhealthy. Expected result.
  9. Show associated route count is 1. Expected result.
  10. Rejoin second network.
  11. Show container goes healthy. Expected result.
  12. Show associated route count is 2. Expected result.
  13. End of test.

@Paraphraser
Copy link
Contributor Author

@zyclonite - Lukas does this comment make any sense to you in the ZeroTier build context?

I have no experience with GitHub Actions so I don't understand anything about how these things run. Do you get to specify the version of buildah, or is this just a service you're using that's maintained by someone else (eg GitHub provides this as a standard service)?

According to buildah install.md, for Debian I need Bookworm plus a standard apt install buildah. On Bookworm 12.9 that gets me buildah version 1.28.2.

For Ubuntu the prerequisite is 20.10 and newer, also with a standard apt install. On Ubuntu 24.04.2 that gets me buildah version 1.33.7 which produces the same error about --start-interval.

Both of these versions are less than the v1.37.0 cited as where the fix went into production.

The latest and greatest appears to be 1.39.1.

I don't get why the Bookworm and Noble Numbat apt installs are so out-of-date.

I started to read Building from scratch but my eyes quickly glazed over.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants