Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Enhancement] Add Health-Check to DockerFile #37

Open
wants to merge 12 commits into
base: main
Choose a base branch
from

Conversation

gb-123-git
Copy link

@gb-123-git gb-123-git commented Feb 18, 2025

Added Health-Check to ZeroTier Docker

This is based on the discussions in Pull Request #33.

Based on the Discussion the following Flow is Created :

  1. The following variables can be defined:
    CHK_ZT_SPECIFIC_NETWORK : <Enter 1 Specific Network for Checking; ZT_MIN_ROUTES_FOR_HEALTH is ignored if this is used.>
    CHK_ZT_MIN_ROUTES_FOR_HEALTH= <Should be a Number greater than 0>

  2. If nothing is defined, then the Health-check will pass only when ALL networks defined are connected.

To Test :

  1. Multiple Networks with ALL Connected to Pass Health-Check
  2. DockerFile build

@zyclonite @Paraphraser @hoppke

@zyclonite & @Paraphraser
Please Check if the Docker build in coming out correctly
Also please test multiple networks

Guys, let me know your thoughts.

Added health-check script for ZeroTier Docker
Added Healthcheck to DockerFile
Changed ENV Variable Names for better understanding
@Paraphraser
Copy link
Contributor

Firstly, thank you so much for choosing "MIN" rather than perpetuating my typo of "MINUMUM". So embarrassing! I considered editing all the posts in #33 but I thought that might confuse the issue.


Next, while it's a very minor thing, perhaps:

  • ZEROTIER_ONE_CHECK_SPECIFIC_NETWORK rather than CHK_ZT_SPECIFIC_NETWORK
  • ZEROTIER_ONE_MIN_ROUTES_FOR_HEALTH rather than CHK_ZT_MIN_ROUTES_FOR_HEALTH

I'm thinking about .env files, how it's best to group variables, and how prefix consistency aids that goal.


Now, on to the PR proper.

I have tested both CHK_ZT_SPECIFIC_NETWORK and CHK_ZT_MIN_ROUTES_FOR_HEALTH and they appear to work as documented.


However, omitting both variables to get the "default behaviour" does not seem to work as documented. I think the problem lies with this code snippet:

#Check if ALL Networks are connected (Default - ZeroTier)
else
    #echo "Checking All Networks"
    joined_networks=$(zerotier-cli listnetworks | awk 'NR>1 {print$3}')
    echo "joined_networks=$joined_networks"
    for network in $joined_networks; do
        [[ "$(zerotier-cli get ${network} status)" = "OK" ]] || exit 1
        #echo "$network Connected."
    done
fi

Note that I have added the echo statement as the 5th line.

Scenario. Assume two network IDs xxx and yyy. You will define:

- ZEROTIER_ONE_NETWORK_IDS=xxx yyy

Assume the container starts and joins both networks. In that situation, the extra echo statement will write:

joined_networks=xxx yyy

Test 1

Detach from one of the networks:

$ docker exec zerotier zerotier-cli leave xxx

The result is:

joined_networks=yyy

The subsequent for-loop succeeds even though xxx has gone away. The container remains healthy when it is in fact unhealthy.

Test 2

Assume nothing has changed from Test 1. Detach from the other network:

$ docker exec zerotier zerotier-cli leave yyy

The result is:

joined_networks=

The for-loop doesn't execute at all so you get a normal exit even though the container hasn't joined either network. The container remains healthy when it is in fact unhealthy.

Proposal

It seems to me that this code might better express your design intention:

#Check if ALL Networks are connected (Default - ZeroTier)
else
    #echo "Checking All Networks"
    for network in ${ZEROTIER_ONE_NETWORK_IDS} ; do
        [[ "$(zerotier-cli get ${network} status)" = "OK" ]] || exit 1
        #echo "$network Connected."
    done
fi

I have tested that, successfully, with ZEROTIER_ONE_NETWORK_IDS containing two, one and zero network IDs. In the zero case, the for-loop doesn't execute so you get a normal healthy exit.

Where this breaks down is if you don't use ZEROTIER_ONE_NETWORK_IDS at all - you simply spin-up the container and do manual joins (as you do for the zerotier-client container). I considered adding something like this to the start of the health-check script:

ZEROTIER_ONE_NETWORK_IDS=${ZEROTIER_ONE_NETWORK_IDS:-$(zerotier-cli listnetworks | awk 'NR>1 {print$3}')

In words:

  • if ZEROTIER_ONE_NETWORK_IDS isn't defined, seed it from the currently-joined networks.

That works but has the potential to be misleading. Say you have two networks and you do a manual leave of one of them. The container will go unhealthy.

Now you recreate the container. The join/leave status of each network is part of the persistent store so the container will come up and only re-join the network it was joined to when it went down. That means seeding from listnetworks will report healthy after the recreate, where it was reporting unhealthy prior to the recreate.

I can just imagine some user seeing unhealthy, deciding to recreate the container, which promptly goes healthy, and the user concludes "all's well" when, in fact, whatever caused the container to leave the network(s) is unresolved.

Perhaps a better solution is to simply change the semantics of ZEROTIER_ONE_NETWORK_IDS from optional to required, and have the container report unhealthy if that variable is null. Something like:

#Check if ALL Networks are connected (Default - ZeroTier)
else
    #echo "Checking All Networks"
    [ -z "${ZEROTIER_ONE_NETWORK_IDS}" ] && exit 1
    for network in ${ZEROTIER_ONE_NETWORK_IDS} ; do
        [[ "$(zerotier-cli get ${network} status)" = "OK" ]] || exit 1
        #echo "$network Connected."
    done
fi

It would be a good idea to also detect the situation in entrypoint-router.sh, which is the only place where an echo statement is going to make it to the container's log where a user is likely to see it. Something like this after line 11:

if [ -z "${ZEROTIER_ONE_NETWORK_IDS}" ]; then
   echo "Warning: ZEROTIER_ONE_NETWORK_IDS is not defined. Container will always report 'unhealthy'."
fi

None of this is perfect. Because you can always use zerotier-cli to join a new network, you can easily have a non-null ZEROTIER_ONE_NETWORK_IDS which only represents a subset of the networks you intend to monitor. Nothing can really force the environment variable to be "correct". It might be worth mentioning in the documentation that keeping the variable accurate is a user responsibility.


Last point. You've provided the script and the mods to the Dockerfile but you haven't updated the README-router.md file. I think that needs to be done as part of the PR.

Hope some of this helps because, as before, I think container health-checks are a very good idea.

@gb-123-git
Copy link
Author

gb-123-git commented Feb 19, 2025

@Paraphraser

Just a brief Reply (Will be posting a detailed reply later as I am travelling):

Test 1
Detach from one of the networks:

$ docker exec zerotier zerotier-cli leave xxx
The result is:

joined_networks=yyy
The subsequent for-loop succeeds even though xxx has gone away. The container remains healthy when it is in fact unhealthy.

The Above seems to be correct as the container should be Healthy since you have left the network xxx and only yyy remains which is connected. The container should be unhealthy only when you are connected to all joined networks. you cannot call is unhealthy if you have left the network.

Test 2
Assume nothing has changed from Test 1. Detach from the other network:

$ docker exec zerotier zerotier-cli leave yyy
The result is:

joined_networks=
The for-loop doesn't execute at all so you get a normal exit even though the container hasn't joined either network. The container remains healthy when it is in fact unhealthy.

You are correct. We need to make it unhealthy if there are no joined networks.

Added additional check for healthcheck failure in case of no joined networks.
Updated README.md to reflect new Environment Variables for Health-Checks
@gb-123-git gb-123-git marked this pull request as draft February 19, 2025 14:57
@gb-123-git gb-123-git marked this pull request as ready for review February 19, 2025 15:32
@gb-123-git
Copy link
Author

@Paraphraser

Thank you soooo much for your suggestions ! I have incorporated ALL your suggestions in the script.

Please use CHK_ZT_SPECIFIC_NETWORKS and specify multiple networks here. This should solve all use cases.

I am not using ZEROTIER_ONE_CHECK_SPECIFIC_NETWORKS as ZEROTIER_ is I think reserved for the Original ZT Functions/Declarations and since this is a custom script, It is better we don't use variables which ZT Container is using.

Please check and revert.

@Paraphraser
Copy link
Contributor

I am not using ZEROTIER_ONE_CHECK_SPECIFIC_NETWORKS as ZEROTIER_ is I think reserved for the Original ZT Functions/Declarations and since this is a custom script, It is better we don't use variables which ZT Container is using.

Well, I think you are wrong about that. I hope you won't be too upset with me if I say it is always better to take the time to check your facts rather than assume a pattern that may work in one place (eg all official environment variables defined by Zigbee2MQTT start with ZIGBEE2MQTT_, or all official Grafana variables start with GF_) necessarily generalises to all containers.

So let's ask my zerotier-router container which environment variables it knows about that start with Z:

$ docker exec zerotier env | grep "^Z"
ZEROTIER_ONE_GATEWAY_MODE=both
ZEROTIER_ONE_NETWORK_IDS=xxx yyy
ZEROTIER_ONE_LOCAL_PHYS=eth0
ZEROTIER_ONE_USE_IPTABLES_NFT=true

If you take a look at README-router.md you'll see that all of those are defined there. They are specific to this zyclonite/zerotier-docker repo and, in particular, the zerotier-router image.

How about the zerotier binary running inside the container?

$ docker exec zerotier strings /usr/sbin/zerotier-one | grep "ZEROTIER"
MATCH_SOURCE_ZEROTIER_ADDRESS
MATCH_DEST_ZEROTIER_ADDRESS
ZEROTIER_HOME
ZEROTIER-IDTOOL
ZEROTIER-CLI

you get the same answer if you substitute zerotier-cli

Those are probably keys for doing lookups in the process environment and, indeed, if I saw either obvious root keys like ZEROTIER or ZEROTIER_ (which might suggest environment variable names are being parsed), or any evidence of ZEROTIER_ONE then that might give me pause. But those aren't evident.

Now, what about ZeroTier's documentation? Try Googling:

"ZEROTIER_ONE" site: docs.zerotier.com

I get zero hits. If I remove the quotes, I get some hits but they all say "Missing ZEROTIER_ONE".

In fact, the only environment variable I have been able to find defined anywhere in the ZeroTier documentation is ZEROTIER_CENTRAL_TOKEN and that seems to be related to something called TerraForm.

So, far from potentially conflicting with anything ZeroTier might be doing, if we stick with the ZEROTIER_ONE_ prefix we will be maintaining consistency with our own prior usage.

I hope that makes sense.

@Paraphraser
Copy link
Contributor

On a different topic, I see you are proposing mods for README.md rather than README-router.md.

Unfortunately, what I am about to say will sound circular but that's because there are several moving parts.

Let me begin by going back to the point I made in my previous post. These variables:

ZEROTIER_ONE_GATEWAY_MODE
ZEROTIER_ONE_NETWORK_IDS
ZEROTIER_ONE_LOCAL_PHYS
ZEROTIER_ONE_USE_IPTABLES_NFT

are specific to the zerotier-router image (the router). They are not (yet) supported by the zerotier image (the client).

To put this another way, the ability to auto-join ZeroTier Cloud networks on first install is a feature of the router container, not the client container. If you run the current client, the only way to join a network is with:

$ docker exec zerotier zerotier-cli join xxxx

So, yes, CHK_ZT_SPECIFIC_NETWORK and CHK_ZT_MIN_ROUTES_FOR_HEALTH will work for both the client and router images but the default case where neither of those variables is present depends on the container knowing which networks the user intends the container should be joined to when deciding whether it is healthy or not.

The only (current) way to provide that information to the router container is via ZEROTIER_ONE_NETWORK_IDS.

Although you could simply pass ZEROTIER_ONE_NETWORK_IDS to the client container, the client does not actually support that variable in the sense of auto-joining networks on first install and I think that could prove a bit confusing for users.

The reason we didn't add support for ZEROTIER_ONE_NETWORK_IDS to the client at the same time we added it to the router was to minimise the disruption to the people who were using the client container. If I remember correctly, the "router" of the time was mis-named. It was actually implementing a bridge. We wanted to make it behave like a proper Layer 3 router, so it was necessary to rip out its guts and start over. The client, on the other hand, was working fine and didn't need to be touched.

You could amend the PR to copy the auto-join functionality from entrypoint-router.sh to entrypoint.sh. At this point I think it is safe to say that the code has been thoroughly tested in the router. If you do that, the behaviour of ZEROTIER_ONE_NETWORK_IDS should be described in README.md and removed from README-router.md.

The existing wording in README-router.md is:

This variable is only effective on first launch. There is no default if it is omitted.

Wording for README.md might go something like this:

This variable is used in two situations. During first launch, the container will attempt to auto-join the listed network(s). Thereafter, and providing CHK_ZT_SPECIFIC_NETWORK and CHK_ZT_MIN_ROUTES_FOR_HEALTH are omitted, the container's health-check will use the listed network(s) to decide whether the container is healthy. If ZEROTIER_ONE_NETWORK_IDS is omitted you will have to join ZeroTier Cloud networks, by hand, using the zerotier-cli join command described above, in which case the results from the container's health-check script may be misleading.

Now, please think about that last sentence and, in particular, for people who use the client. Thus far, their experience is that they launch the container and then zerotier-cli join their networks. This PR is adding health-checking to both the client and the router. Users of the client will suddenly see "healthy" and be happy. But then, even when a network goes away, the container will still report "healthy". That's a recipe for issues appearing on this repo.

You can ameliorate that to some extent with the code I suggested yesterday:

  • in the health-check script:

     #Check if ALL Networks are connected (Default - ZeroTier)
     else
         #echo "Checking All Networks"
         [ -z "${ZEROTIER_ONE_NETWORK_IDS}" ] && exit 1
         for network in ${ZEROTIER_ONE_NETWORK_IDS} ; do
             [[ "$(zerotier-cli get ${network} status)" = "OK" ]] || exit 1
             #echo "$network Connected."
         done
     fi
    
  • in both entrypoint-router.sh and entrypoint.sh:

     if [ -z "${ZEROTIER_ONE_NETWORK_IDS}" ]; then
        echo "Warning: ZEROTIER_ONE_NETWORK_IDS is not defined. Container will always report 'unhealthy'."
     fi
    

Now let me focus on your revised proposed code:

else
     #echo "Checking All Networks"
     joined_networks=$(zerotier-cli listnetworks | awk 'NR>1 {print$3}')
     #If there are no Networks, exit Failure
     [[ -n ${joined_networks} ]] || exit 1
     for network in $joined_networks; do
         [[ "$(zerotier-cli get ${network} status)" = "OK" ]] || exit 1
         #echo "$network Connected."
     done
fi 

First, please study this:

$ docker exec -it zerotier /bin/sh

/ # joined_networks=fred
/ # echo $joined_networks
fred
/ # [[ -n ${joined_networks} ]] || echo exit
/ # joined_networks=
/ # echo $joined_networks

/ # [[ -n ${joined_networks} ]] || echo exit
/ # [[ -n "${joined_networks}" ]] || echo exit
exit
/ # joined_networks=fred
/ # echo $joined_networks
fred
/ # [[ -n "${joined_networks}" ]] || echo exit
/ # 

In words:

  1. Open an interactive shell with "sh" inside the container. Why "sh"? Because of both:

    • the #!/bin/sh (which I believe needs to be the first line in the script so the "sponsored" should be the second line); and
    • the CMD ["/bin/sh", "/usr/sbin/healthcheck.sh"] in the Dockerfile.

    Either way, the healthcheck script will be run by "sh" so we should test with that.

  2. Define joined_networks and prove it exists.

  3. Run your test for non-null. It does not take the exit. Correct result!

  4. Un-define joined_networks and prove it is null when evaluated.

  5. Run your test for non-null. It does not take the exit. Incorrect result!

  6. Alter your test for non-null to encapsulate the variable in double quotes. It takes the exit. Correct result.

  7. Redefine joined_networks and prove it exists.

  8. Re-run the test for non-null with quote marks. It does not take the exit. Correct result.

Rule of thumb: If and only if you can be certain that a variable will only ever contain the string representation of a valid number should you omit quotes. For example:

if [ $? -ne 0 ] ; then

The result code will always be numeric so it can be compared with a numeric and it is safe to omit the quotes from both sides, which then affords the opportunity to use numeric comparison operators like -eq and -ne.

Now, I will get back to the intent of the code (rather than its syntax).

I am still not convinced that you actually understand how this code will behave at run-time. Three scenarios.

scenario 1

The container is not joined to any networks.

It doesn't matter whether this is because the container has never joined any networks or because the networks it has joined have gone away.

joined_networks will be null and (assuming you fix the quotes) the early exit will fire. Expected result.

scenario 2

The container is joined to exactly one network but that network goes away.

joined_networks will be null and (assuming you fix the quotes) the early exit will fire. Expected result.

scenario 3

The container is joined to two or more networks and all save one of those networks goes away.

joined_networks will always be non-null. The early exit will not fire and the for-loop will execute. The loop will only evaluate the currently-joined networks. The loop will always complete normally, meaning exit code 0, meaning "healthy". If that really is your intention, you could simply reduce the code to:

else
     #echo "Checking All Networks"
     joined_networks=$(zerotier-cli listnetworks | awk 'NR>1 {print$3}')
     #If there are no Networks, exit Failure
     [[ -n "${joined_networks}" ]] || exit 1
fi 

However, if your intention is that the container should go "unhealthy" when any of its expected networks goes away then you need another source of information so that the container knows what its "expected networks" are. The zerotier-cli listnetworks command can never fulfil that role.

That's why I suggested using ZEROTIER_ONE_NETWORK_IDS as the authoritative source of that information:

else
    #echo "Checking All Networks"
    [ -z "${ZEROTIER_ONE_NETWORK_IDS}" ] && exit 1
    for network in ${ZEROTIER_ONE_NETWORK_IDS} ; do
        [[ "$(zerotier-cli get ${network} status)" = "OK" ]] || exit 1
        #echo "$network Connected."
    done
fi

If the user doesn't define ZEROTIER_ONE_NETWORK_IDS then the container will always report unhealthy. If you make the changes to the entrypoint scripts then there will be a message in the log alerting the user to the need to define that variable.

If the user defines one or more networks then the for-loop will be executed for each. If an expected network is missing then the zerotier-cli get ${network} status will not return "OK" so the exit will fire and the container will go unhealthy.

If and only if ZEROTIER_ONE_NETWORK_IDS is defined and all listed networks are joined and report "OK" will the container report "healthy".

Isn't that what you actually want?

@gb-123-git
Copy link
Author

gb-123-git commented Feb 21, 2025

@Paraphraser

I have fixed the quotes. (Thanks for pointing it out).
Here are my thoughts in short:

  1. The code here is not intended to be for the router but for client only. (Hence I updated readme.md)
  2. Scenario 3 is supposed to check for joined networks and if all are OK, then container is healthy, and even if 1 is not OK, then container is unhealthy. This will only work for joined networks. This is the default method by ZT. It is NOT supposed to work to see if the network specified by user is joined or not. (You can always use CHK_ZT_SPECIFIC_NETWORKS) for that.

I am skeptical in implementing it through ZEROTIER_ONE_NETWORK_IDS, as the variable does other functions too (eg. joining the Network) which is not a standard protocol by Zerotier as you yourself mentioned.
If you want to achieve your result, you can always use a combination of ZEROTIER_ONE_NETWORK_IDS & CHK_ZT_SPECIFIC_NETWORKS and define both. In this case, what you want will be achieved.

Lastly, I would incorporate ZEROTIER_ONE_NETWORK_IDS only when @zyclonite agrees to change entrypoint.sh & entrypoint-router.sh to incorporate ZEROTIER_ONE_NETWORK_IDS in a way that:

  1. If the Network is not Joined, then join the Network,
  2. If The Network is Joined, then skip
  3. If the Network is not specified in the ENV Variable, but the .conf file exists, then remove the .conf to sync the Docker with the IDs specified by the user.
  4. Make this a mandatory variable.

Right now, the way this is implemented is that it just checks if directory exists and if not then it just creates a blank config file to join the network.
This method may not be the best way to execute the intent.

Also the same should be executed in client as well for consistency.

As of now, I am creating a separate variable CHK_ZT_SPECIFIC_NETWORKS till the ZEROTIER_ONE_NETWORK_IDS is fully implemented.

@zyclonite
Copy link
Owner

first, thank you both for the contribution and detailed discussion!

i would recommend keeping two different variables, not everyone want's to join the network via variables but might want to have it health-checked still. i am ok with adding the same join logic from the router to the other entrypoint file to get consistent behavior.

@Paraphraser
Copy link
Contributor

By now, you're probably sick of the sound of my keystrokes. Sorry about that. You'll probably be even more tired once you get to the end of this. Sorry about that too.

I am still 100% dedicated to the idea of a healthcheck. I just want it to work.


I'd like to walk back (slightly) a comment I made earlier on the topic of environment variable names. One source I neglected to check was this repo itself. It turns out that entrypoint.sh (the client's entry point script) mentions:

  • ZT_ALLOW_MANAGEMENT_FROM
  • ZT_OVERRIDE_LOCAL_CONF
  • ZT_PRIMARY_PORT
  • ZT_ALLOW_TCP_FALLBACK_RELAY

These, however, are not documented anywhere I can see. Nevertheless, we can generalise the following convention for variable names:

  • Variables with a ZT_ prefix affect the client.
  • Variables with a ZEROTIER_ONE_ prefix affect the router.

It is also true to say that the usage of these existing environment variables is confined to the respective entry point scripts, which means ZT_ prefix variables only affect the client, not the router. This is a subtle distinction and the reason I'm mentioning it here is because I'm about to explain just how little difference there is between the client and the router.


The code here is not intended to be for the router but for client only

I think you might not fully appreciate the relationship between the client and the router.

You've proposed mods to Dockerfile to add the healthcheck. If you go back and look at Line 6 you'll see a FROM statement which bases the client container on Alpine.

So, the build process starts with an Alpine Linux image and then builds ZeroTier in situ. I assume you understand that part.

Now go and look at the FROM statement in Dockerfile.router (line 4) The router container is based on the client container.

In other words, whatever you do to enhance the client is also inherited by the router.

The health check scaffolding you might have thought you were only adding to the client also gets added to the router. You can use either the client or the router to test health checking. It makes absolutely no difference.

What is the difference, then, between the client and the router?

In one sense, none. The compiled zerotier-one binary running in the client container is identical to the binary running in the router container. The binary isn't invoked with different options which somehow magically tell it to become a router. Nothing like that at all.

The difference between the containers is found by comparing the two entry point scripts. The one for the client is pretty much what you'll find in most Docker containers: a bit of setup followed by an exec "$@", at which point the shell running the entry point script is replaced by the zerotier-one binary.

The entry point script for the router is a good deal more complicated but a lot of that is the result of how Docker starts containers. At its core, the router's entry point script is just defining a bunch of iptables filters to control how traffic is routed at Layer Three.

The way the router launches the zerotier-one binary is a bit different (nohup rather than exec) but that's because the entry point script needs to keep running alongside the zerotier-one binary so it can withdraw the iptables filters when the container is taken down.

If the container doesn't manage its iptables filters from birth to death then multiple successive launches of the router container would see duplicate filters being added to iptables.

ZeroTier (the container) runs in "host mode" which means the container's networking and the host's networking are the same, irrespective of whether you are running the client or the router. What that also means is the iptables filters being created by the router container are applied to the host, not the container.

zerotier-one (the process) running inside the container is the same. Nil difference. All the actual client vs router differences are external to the container. It's not the router container that is doing the routing. It's the host.

It might also help to remember what Layer 3 routing is. It is the process of a computer (or silicon) making decisions about how to forward packets between network interfaces.

Say you have a remote device like an iPad running the ZeroTier client, and a Raspberry Pi at home also running the ZeroTier client. Once you start the ZeroTier tunnel on the iPad, it can reach the Raspberry Pi.

Suppose you also want to reach another computer (eg a Mac) that isn't running the ZeroTier client but happens to be connected to the same subnet as the Pi. You can open an SSH connection to the Raspberry Pi and, from there, open another SSH connection to the Mac. And, if you're into SSH port-forwarding, you can get SSH to do that for you.

All the zerotier-router does is help automate things so you can go straight from the iPad to the Mac for all protocols. The "routing" action depends on a combination of the iptables filters and static routes.


I want to continue on the theme of how the difference between the client and router is largely cosmetic.

If they are more-or-less identical, how can you tell whether you are running the client or the router container?

The most obvious way is to execute a ps inside the container:

  • this is the router:

     $ docker exec zerotier ps
     PID   USER     TIME  COMMAND
         1 root      0:00 sh /usr/sbin/entrypoint-router.sh -U
        16 root      0:00 tail -f /tmp/zerotier-ipc-log
        17 root      0:08 zerotier-one -U
      1734 root      0:00 ps
    

    PID 1 is the entry point script which remains running for the life of the container. It spawns the tail (PID 16 to support logging) and the client (PID 17).

  • this is the client:

     $ docker exec zerotier ps
     PID   USER     TIME  COMMAND
         1 root      0:00 zerotier-one -U
        47 root      0:00 ps
    

    PID 1 is the client. The container started with its entry point script as PID 1 but "exec'd" the client over the top so the client inherited PID 1.

router's operational environment

With the router running, what does the container know?

$ docker exec zerotier zerotier-cli listnetworks
200 listnetworks <nwid> <name> <mac> <status> <type> <dev> <ZT assigned ips>
200 listnetworks 9999888877776666 My_ZeroTier 22:43:c0:8e:8d:1c OK PRIVATE ztr2qsmswx 10.244.211.241/16
200 listnetworks 9999888877775555 Test 5e:7e:1c:52:e3:71 OK PRIVATE ztc3qzoglu 10.242.211.241/16

The container knows about two joined networks.

What does the host know?

$ ip a | grep zt
90: ztc3qzoglu: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 2800 qdisc fq_codel state UNKNOWN group default qlen 1000
    inet 10.242.211.241/16 brd 10.242.255.255 scope global ztc3qzoglu
91: ztr2qsmswx: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 2800 qdisc fq_codel state UNKNOWN group default qlen 1000
    inet 10.244.211.241/16 brd 10.244.255.255 scope global ztr2qsmswx

$ ip r | grep zt
10.242.0.0/16 dev ztc3qzoglu proto kernel scope link src 10.242.211.241 
10.244.0.0/16 dev ztr2qsmswx proto kernel scope link src 10.244.211.241 
192.168.0.0/23 via 10.244.124.118 dev ztr2qsmswx proto static metric 5000 

First, the host knows about the same network interfaces that appear in the listnetworks command.

Second, the host knows that those interfaces represent routes to the two ZeroTier Cloud networks. This is how, when you do something like SSH from iPad to Pi, then from Pi to Mac, that the reply traffic knows how to find its way back into the tunnel and across the ZeroTier Cloud network.

The third line (192.168.0.0/23) is a static route advertised by ZeroTier Cloud. It's an artifact of my own ZeroTier setup. You won't necessarily see one (or more) of these in every user's configuration.

client's operational environment

Now let's ask the same questions of the client.

$ docker exec zerotier zerotier-cli listnetworks
200 listnetworks <nwid> <name> <mac> <status> <type> <dev> <ZT assigned ips>
200 listnetworks 9999888877776666 My_ZeroTier 22:43:c0:8e:8d:1c OK PRIVATE ztr2qsmswx 10.244.211.241/16
200 listnetworks 9999888877775555 Test 5e:7e:1c:52:e3:71 OK PRIVATE ztc3qzoglu 10.242.211.241/16

Identical answer. What about the host?

$ ip a | grep zt
92: ztc3qzoglu: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 2800 qdisc fq_codel state UNKNOWN group default qlen 1000
    inet 10.242.211.241/16 brd 10.242.255.255 scope global ztc3qzoglu
93: ztr2qsmswx: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 2800 qdisc fq_codel state UNKNOWN group default qlen 1000
    inet 10.244.211.241/16 brd 10.244.255.255 scope global ztr2qsmswx

$ ip r | grep zt
10.242.0.0/16 dev ztc3qzoglu proto kernel scope link src 10.242.211.241 
10.244.0.0/16 dev ztr2qsmswx proto kernel scope link src 10.244.211.241 
192.168.0.0/23 via 10.244.124.118 dev ztr2qsmswx proto static metric 5000 

Same again.

The fact that a ZeroTier network (the thing you join) is expressed as an entry in the host's routing table, is independent of whether you are running the client or the router, and is why "counting routes" is going to be more reliable than any other scheme.


With the client still running, I'm now going to make a mess:

$ nmcli conn show | grep -e "NAME" -e "^zt"
NAME                UUID                                  TYPE      DEVICE          
ztc3qzoglu          fa62d9e7-4c07-4881-b423-23b3a0616399  tun       ztc3qzoglu      
ztr2qsmswx          907d0cf1-3923-4de4-b6ab-5920c2ccda0d  tun       ztr2qsmswx      

$ sudo nmcli conn down ztc3qzoglu
Connection 'ztc3qzoglu' successfully deactivated (D-Bus active path: /org/freedesktop/NetworkManager/ActiveConnection/38)

$ nmcli conn show | grep -e "NAME" -e "^zt"
NAME                UUID                                  TYPE      DEVICE          
ztr2qsmswx          907d0cf1-3923-4de4-b6ab-5920c2ccda0d  tun       ztr2qsmswx      

I've clobbered the network interface associated with one of the joined networks. There's no way any traffic can be forwarded anywhere. What does the host know?

$ ip a | grep zt
92: ztc3qzoglu: <BROADCAST,MULTICAST> mtu 2800 qdisc fq_codel state DOWN group default qlen 1000
93: ztr2qsmswx: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 2800 qdisc fq_codel state UNKNOWN group default qlen 1000
    inet 10.244.211.241/16 brd 10.244.255.255 scope global ztr2qsmswx

$ ip r | grep zt
10.244.0.0/16 dev ztr2qsmswx proto kernel scope link src 10.244.211.241 
192.168.0.0/23 via 10.244.124.118 dev ztr2qsmswx proto static metric 5000 

It knows the interface is dead and, because the interface has gone away, the associated route has been withdrawn. This is the host doing this work, not the container.

What does the container know?

$ docker exec zerotier zerotier-cli listnetworks
200 listnetworks <nwid> <name> <mac> <status> <type> <dev> <ZT assigned ips>
200 listnetworks 9999888877776666 My_ZeroTier 22:43:c0:8e:8d:1c OK PRIVATE ztr2qsmswx 10.244.211.241/16
200 listnetworks 9999888877775555 Test 5e:7e:1c:52:e3:71 OK PRIVATE ztc3qzoglu 10.242.211.241/16

The container still thinks the Test network (interface ztc3qzoglu) is alive and kicking. What about the health-check?

$ DPS zerotier
NAMES      CREATED          STATUS                    SIZE
zerotier   15 minutes ago   Up 15 minutes (healthy)   0B (virtual 14.4MB)

DPS is an alias on docker ps - returns fewer columns and lets me filter the result by container name.

I've wreaked utter havoc here but the container is none-the-wiser.

Bottom line: listnetworks and get status are (presumably) reliable as far as ZeroTier Cloud's interworking is concerned but are not reliable in terms of whether the container is able to function correctly.


i would recommend keeping two different variables

Lukas makes a good point - a separate variable for the networks to be checked.

You've already done that:

  • CHK_ZT_SPECIFIC_NETWORKS

These lines in your healthcheck.sh "work", providing that the client (or router) can sense a network status of something other than "OK":

if [[ -n "${CHK_ZT_SPECIFIC_NETWORKS}" ]] ; then
    for network in $CHK_ZT_SPECIFIC_NETWORKS; do
        #If Network is OK, continue, else exit
        [[ "$(zerotier-cli get ${network} status)" = "OK" ]] || exit 1
        #echo "${CHK_ZT_SPECIFIC_NETWORKS} Connected."
    done
    exit 0

Which brings me to these lines (which I'll call the problematic lines):

else
    #echo "Checking All Networks"
    joined_networks=$(zerotier-cli listnetworks | awk 'NR>1 {print$3}')
    #If there are no Networks, exit Failure
    [[ -n "${joined_networks}" ]] || exit 1
    for network in $joined_networks; do
        [[ "$(zerotier-cli get ${network} status)" = "OK" ]] || exit 1
        #echo "$network Connected."
    done
fi

In all my experimentation, I have never been able to contrive the conditions where both of the following are true:

  1. A network is listed in zerotier-cli listnetworks; AND
  2. get ${network} status returns something other than "OK".

Can you show me an example or tell me how to create the necessary conditions?

Until you can, I don't believe the problematic lines serve any purpose. If, instead of using ZEROTIER_ONE_NETWORK_IDS (as I proposed) we follow Lukas' suggestion of two different variables, we are left with either:

  • Use CHK_ZT_SPECIFIC_NETWORKS, in which case the simple presence of that variable will mean the *problematic" lines will never execute because the earlier code will take precedence; or

  • Invent yet another variable, in which case:

    1. You're in the situation where "all variables omitted" needs the very default you're trying to provide; and
    2. The code simply replicates CHK_ZT_SPECIFIC_NETWORKS anyway.

proposed health check script

With that in mind, I would like to propose a different approach:

#!/bin/sh
#This health-check script is sponsored by PMGA TECH LLP

#Exit Codes
# 0= Success
# 1= Failure


#Environment Variables
# CHK_ZT_SPECIFIC_NETWORKS=         <Enter Networks to check with space in between each entry; All networks entered here would be matched; CHK_ZT_MIN_ROUTES_FOR_HEALTH is ignored if this is used.>
# CHK_ZT_MIN_ROUTES_FOR_HEALTH=     <Should be a Number greater than 0>

# minimum routes for health defaults to 1 route
CHK_ZT_MIN_ROUTES_FOR_HEALTH=${CHK_ZT_MIN_ROUTES_FOR_HEALTH:-1}

# Check if specified Networks are all Connected
if [[ -n "${CHK_ZT_SPECIFIC_NETWORKS}" ]] ; then

    for network in $CHK_ZT_SPECIFIC_NETWORKS; do
        #If Network is OK, continue, else exit
        [[ "$(zerotier-cli get ${network} status)" = "OK" ]] || exit 1
    done

else # Check for Minimum Networks

    # count zerotier-associated direct routes
    routes=$(ip r | grep "dev zt" | grep -cv "via")

    # sense less than minimum
    [[ ${routes} -lt ${CHK_ZT_MIN_ROUTES_FOR_HEALTH} ]] && exit 1

fi

exit 0

I gave some thought to sanitising CHK_ZT_MIN_ROUTES_FOR_HEALTH to make sure it was a numeric, then applying the default of 1 if it wasn't. On balance, however, it seems to me that if a user passes a non-numeric, the result will be the health check script crashes, which returns 1, so the container goes unhealthy. Forcing it to default in that situation will likely create the conditions for a silent fail. It's better to draw problems to the user's attention.

testing

Service definition (client):

  zerotier:
    container_name: zerotier
    image: "zyclonite/zerotier:local"
    restart: unless-stopped
    network_mode: host
    volumes:
      - ./volumes/zerotier-one:/var/lib/zerotier-one
    devices:
      - "/dev/net/tun:/dev/net/tun"
    cap_add:
      - NET_ADMIN
      - SYS_ADMIN

No environment variables. That means the minimum route count defaults to 1.

The only other thing to be aware of is that the persistent store already exists and has already joined two networks. Those will be resumed when the container comes up.

$ UP zerotier
 ✔ Container zerotier  Started                                                                                                                             0.2s 

$ docker exec zerotier zerotier-cli listnetworks
200 listnetworks <nwid> <name> <mac> <status> <type> <dev> <ZT assigned ips>
200 listnetworks 9999888877776666 My_ZeroTier 22:43:c0:8e:8d:1c OK PRIVATE ztr2qsmswx 10.244.211.241/16
200 listnetworks 9999888877775555 Test 5e:7e:1c:52:e3:71 OK PRIVATE ztc3qzoglu 10.242.211.241/16

$ DPS zerotier
NAMES      CREATED              STATUS                        SIZE
zerotier   About a minute ago   Up About a minute (healthy)   0B (virtual 14.4MB)

$ ip r | grep zt
10.242.0.0/16 dev ztc3qzoglu proto kernel scope link src 10.242.211.241 
10.244.0.0/16 dev ztr2qsmswx proto kernel scope link src 10.244.211.241 
192.168.0.0/23 via 10.244.124.118 dev ztr2qsmswx proto static metric 5000 

Container started. Knows about two networks. Reports healthy. Expected routes.

Remove one network:

$ docker exec zerotier zerotier-cli leave 9999888877775555
200 leave OK

$ ip r | grep zt
10.244.0.0/16 dev ztr2qsmswx proto kernel scope link src 10.244.211.241 
192.168.0.0/23 via 10.244.124.118 dev ztr2qsmswx proto static metric 5000 

$ DPS zerotier
NAMES      CREATED         STATUS                   SIZE
zerotier   4 minutes ago   Up 4 minutes (healthy)   0B (virtual 14.4MB)

So, the network has gone, the routes associated with it have gone, but the container remains healthy. Because the "no environment variables" condition means we are looking for a minimum route count of 1, that's the expected result.

Now let's get rid of the second network:

$ docker exec zerotier zerotier-cli leave 9999888877776666
200 leave OK

$ ip r | grep zt
$ 

$ DPS zerotier
NAMES      CREATED         STATUS                     SIZE
zerotier   7 minutes ago   Up 7 minutes (unhealthy)   0B (virtual 14.4MB)

All relevant routes have gone and the container eventually goes unhealthy. Perfect!

Now I'll re-join one of the networks and wait for the container to go healthy:

$ docker exec zerotier zerotier-cli join 9999888877775555
200 join OK

$ DPS zerotier
NAMES      CREATED         STATUS                   SIZE
zerotier   9 minutes ago   Up 9 minutes (healthy)   0B (virtual 14.4MB)

And now I'll do the serious damage I did before by nuking the interface:

$ nmcli conn show | grep -e "NAME" -e "^zt"
NAME                UUID                                  TYPE      DEVICE     
ztc3qzoglu          38fb0bb5-75d0-4de0-857d-fa9fcbf979b8  tun       ztc3qzoglu 

$ sudo nmcli conn down ztc3qzoglu
Connection 'ztc3qzoglu' successfully deactivated (D-Bus active path: /org/freedesktop/NetworkManager/ActiveConnection/47)

$ nmcli conn show | grep -e "NAME" -e "^zt"
NAME                UUID                                  TYPE      DEVICE  

Does the container go unhealthy?

$ DPS zerotier
NAMES      CREATED          STATUS                      SIZE
zerotier   13 minutes ago   Up 13 minutes (unhealthy)   0B (virtual 14.4MB)

Youbetcha! Why? Because of the routes:

$ ip r | grep "dev zt" | grep -cv "via"
0

What does the container itself actually think?

$ docker exec zerotier zerotier-cli listnetworks
200 listnetworks <nwid> <name> <mac> <status> <type> <dev> <ZT assigned ips>
200 listnetworks 9999888877775555 Test 5e:7e:1c:52:e3:71 OK PRIVATE ztc3qzoglu 10.242.211.241/16

It thinks the network is still there and OK. If we used the listnetworks and get status approach (rather than counting routes) we'd be getting a false positive.

Counting routes really is better, and a default of "at least one route" is probably a sensible default in the vast majority of user cases.

one more thing

This is only a minor point but, seeing as you're changing the Dockerfile, I noticed these warnings during builds:

 => WARN: FromAsCasing: 'as' and 'FROM' keywords' casing do not match (line 6)                                                                             0.0s

...

1 warning found (use docker --debug to expand):
 - FromAsCasing: 'as' and 'FROM' keywords' casing do not match (line 6)

It is caused by the lower-case "as" in:

FROM ${ALPINE_IMAGE}:${ALPINE_VERSION} as builder

Changing to upper case causes the problem to go away:

FROM ${ALPINE_IMAGE}:${ALPINE_VERSION} AS builder

@gb-123-git
Copy link
Author

gb-123-git commented Feb 22, 2025

@Paraphraser
Will answer your posts in a detailed way later. In the Meantime please test the new changes.
New Change-log:

  1. Renamed the variables to ZT_CHK_SPECIFIC_NETWORKS & ZT_CHK_MIN_ROUTES_FOR_HEALTH
  2. Added Advanced Join feature : ZT_NETWORK_IDS.
    If ZT_NETWORK_IDS is defined, the container will LEAVE ALL Other NETWORKS and AUTO-JOIN the specified networks.
  3. If no variable is defined, no changes to the existing code.

@zyclonite
Added the auto-join feature which is more consistent. Waiting for your thoughts on the same.

I have Specifically not used ZEROTIER_ONE_NETWORK_IDS so as not to make this a breaking change! ( I believe we can deprecate the previous values in the next version to give users a chance to change their settings)

@gb-123-git
Copy link
Author

gb-123-git commented Feb 23, 2025

Regarding Environment Variables

Well, I think you are wrong about that. I hope you won't be too upset with me if I say it is always better to take the time to check your facts rather than assume a pattern that may work in one place (eg all official environment variables defined by Zigbee2MQTT start with ZIGBEE2MQTT_, or all official Grafana variables start with GF_) necessarily generalises to all containers.

I think you have already mentioned the 'fact' that ZeroTier uses :

ZEROTIER_HOME
ZEROTIER-IDTOOL
ZEROTIER-CLI

Therefore I didnt want to use anything starting with ZEROTIER

Regarding sh

Open an interactive shell with "sh" inside the container. Why "sh"? Because of both:

the #!/bin/sh (which I believe needs to be the first line in the script so the "sponsored" should be the second line); and

sh because untill Alpine 3.20 (or Maybe 3.21) the ZeroTier docker uses Busybox SH (ash) and NOT bash so the script needs to be compatible in future also.

You may want to believe that #!/bin/sh should be the first line, but that's not the case. All lines (whether first or second) starting with # are ommited by the interpreter.

Regarding Code

I am still not convinced that you actually understand how this code will behave at run-time. Three scenarios.
Isn't that what you actually want?

Well you may or may not be convinced, but the code is doing what I intend it to do (apart from the slight syntax mistake that you pointed out which may happen since we all are humans).

The code you call 'problematic' is actually how the original ZeroTier docker does the health-checks which @zyclonite has already pointed out in the previous thread. I did not want to eliminate their code completely. Rather I wanted to build on their code. Hence I have kept their code as default behavior.

General Suggestion for @Paraphraser

While I do appreciate your enthusiasm, imho, it would be motivating for all the contributors if your messages are a little on the 'politer' tone rather than an accusatory tone ( 'I am still not convinced that you actually understand how this code will behave at run-time')

Using the current code update, you can use the combination of ZT_NETWORK_IDS and ZT_CHK_SPECIFIC_NETWORKS.
The other advantage of using 2 variables is that you can also have some networks which are not critical, but part of the Networks specified by the User. Using 2 variables will give us a lot more flexibility.

@Paraphraser
Copy link
Contributor

With all possible respect, I'd simply say that if you find it troublesome, upsetting or impolite to be told that the writer isn't convinced that you understand something, the best solution is to demonstrate understanding.

I've tried to be gentle, using persuasion, offering guidance, explanations and examples. I think it's time to be a bit more direct.

Before I do, I'd like to make something clear. When it comes to this repository, I have no special role. I'm Joe Ordinary and my voice has the same weight as everyone else's. Lukas owns the repo so he makes the final decisions on everything.

While I do appreciate your enthusiasm

I think you might be mistaking "enthusiasm" for "enlightened self interest".

I depend on ZeroTier and I want it to work. That's why I have a watch on this repo. If I see a proposal which is likely to have an impact on me, I feel obliged to test it. If my testing reveals what I think are deficiencies, I feel it's incumbent upon me to explain what those are.


On the subject of environment variable names. I think you are being disingenuous and selective in your example.

I have already explained that ZEROTIER_ONE_ prefixed variables are documented whereas the existing ZT_ prefix variables are undocumented (and confined to the client). The world is full of undocumented variables, parameters and APIs. They are usually undocumented for a reason.

In my view, one of the worst things any container designer can do is to introduce changes which are, arguably, simply for the sake of change.

I see your proposal to deprecate ZEROTIER_ONE_NETWORK_IDS and replace it with ZT_NETWORK_IDS as a change being proposed simply for the sake of change.

I really do not understand why you seem to be so fixated on using ZT_. It is not as though anyone types these variable names over and over. If you use docker compose then they are in the service definition. If you use docker run in production (as distinct from one-off experimentation) you're going to embed it in a script.

Perhaps I should put it like this: the ZEROTIER_ONE_ prefix got here first. Get over it. If you think that level of directness is impolite, so be it.

Maintaining consistency with existing usage is important. I am not going to endorse this PR while you persist with this. However, I refer you back to what I said about how I have no special role. It's Lukas' decision.

My recommendations:

Inconsistent variable name Consistent variable name
ZT_CHK_SPECIFIC_NETWORKS ZEROTIER_ONE_CHK_SPECIFIC_NETWORKS
ZT_CHK_MIN_ROUTES_FOR_HEALTH ZEROTIER_ONE_CHK_MIN_ROUTES_FOR_HEALTH

For the record:

$ git remote -v
origin	https://github.com/zerotier/ZeroTierOne.git (fetch)
origin	https://github.com/zerotier/ZeroTierOne.git (push)

$ git grep -e "ZEROTIER_ONE_NETWORK_IDS" \
           -e "ZEROTIER_ONE_LOCAL_PHYS" \
           -e "ZEROTIER_ONE_USE_IPTABLES_NFT" \
           -e "ZEROTIER_ONE_GATEWAY_MODE" \
           -e "ZEROTIER_ONE_CHK_SPECIFIC_NETWORKS" \
           -e "ZEROTIER_ONE_CHK_MIN_ROUTES_FOR_HEALTH"

$ 

On the other hand:

$ git grep "ZEROTIER_ONE_" | wc -l
      44
$ git grep "ZT_" | wc -l
    4464

Thus, standardising on ZT_ gives a much greater probability of collision than standardising on ZEROTIER_ONE_. And that's if you're worried about what's in the code (as distinct from what makes it into the compiled binary as a likely environment variable key - see the strings example from before).


Saying that sh and ash both map to busybox as though it were some kind of answer is also disingenuous.

Yes, in this case, the container only has a single shell. That's entirely beside the point. I'm talking about best practice. I'm talking about the habits that will protect both you and the people who use your code when a container does have multiple shells and where invoking the correct shell matters.

It's simple:

$ file ./zerotier-docker/scripts/*
./zerotier-docker/scripts/entrypoint-router.sh: a sh script, ASCII text executable
./zerotier-docker/scripts/entrypoint.sh:        a sh script, ASCII text executable
./zerotier-docker/scripts/healthcheck.sh:       ASCII text

The reason why healthcheck.sh isn't being classified as a script is because the first two bytes of the file are not 0x23 0x21 (ASCII #! aka "shebang"). These are known as "signature bytes" or "magic numbers".

The correct solution is:

#!/usr/bin/env sh
#This health-check script is sponsored by PMGA TECH LLP

Personally, it feels a bit off to be embedding sponsorship messages in someone else's repo but maybe Lukas won't mind.

When you make that change, the script is classified the same as the entry point scripts.


Still on healthcheck.sh. Something else I noticed:

$ ls -l ./zerotier-docker/scripts/*
-rwxr-xr-x 1 moi moi 6217 Feb 23 10:01 ./zerotier-docker/scripts/entrypoint-router.sh
-rwxr-xr-x 1 moi moi 2214 Feb 23 10:01 ./zerotier-docker/scripts/entrypoint.sh
-rw-r--r-- 1 moi moi 2028 Feb 23 15:19 ./zerotier-docker/scripts/healthcheck.sh

The healthcheck.sh doesn't have execute permission set. I surmise that that led you to adding this line to the Dockerfile:

$ grep chmod ./zerotier-docker/Dockerfile
RUN chmod +x /usr/sbin/healthcheck.sh

If you do the chmod in the repository, Docker copies the permissions into the container during the build, and you don't need the RUN statement. That saves a layer in the final image. Once again, this is a matter of best practice.

If you really did need a chmod in the Dockerfile, the appropriate place to do it is by && concatenation with one of the existing RUN commands.


I don't see the need for the new changes you have just proposed for entrypoint-router.sh. And, by extension, the same applies to the parallel changes you have just added to entrypoint.sh for the client.

Elsewhere you wrote:

I did not want to eliminate their code completely.

If you want to be guided by a principle of "least elimination" then lines 81..89 were the source of the auto-join code which is currently in entrypoint-router.sh.

The existing support for ZEROTIER_ONE_NETWORK_IDS is very light touch. That is by design. The code only executes if the networks.d directory does not exist. That directory not having been created is the signature of a "first run".

It is only safe to automate joining in the first run situation.

In particular, leaving a network resets its internal configuration (allowManaged, allowGlobal, allowDefault, allowDNS) to defaults and such loss can trigger unexpected network side-effects.

Once the container is running, I really think you need to leave it up to the user to execute explicit join and leave commands.


I have been reflecting on the overall utility of ZEROTIER_ONE_CHK_SPECIFIC_NETWORKS.

At best it will detect the situation where a user has done an explicit leave of a network listed in the variable.

At worst, it will report the container healthy when it is not (false positive). That's because of the problem I've documented several times already where the container thinks a network is OK when it is non-viable. Which is also the point Lukas was making in #33 (more on that below).

I think this could be improved with a two-stage check:

  • does get status report "OK" ?; if yes then
  • is there at least one associated route ?

How about this:

#!/usr/bin/env sh
#This health-check script is sponsored by PMGA TECH LLP

#Exit Codes
# 0= Success
# 1= Failure

#Environment Variables
# ZEROTIER_ONE_CHK_SPECIFIC_NETWORKS=         <Enter Networks to check with space in between each entry; All networks entered here would be matched; ZEROTIER_ONE_CHK_MIN_ROUTES_FOR_HEALTH is ignored if this is used.>
# ZEROTIER_ONE_CHK_MIN_ROUTES_FOR_HEALTH=     <Should be a Number greater than 0>

# minimum routes for health defaults to 1 route
ZEROTIER_ONE_CHK_MIN_ROUTES_FOR_HEALTH=${ZEROTIER_ONE_CHK_MIN_ROUTES_FOR_HEALTH:-1}

# Check if specified Networks are all Connected
if [[ -n "${ZEROTIER_ONE_CHK_SPECIFIC_NETWORKS}" ]] ; then

    for network in $ZEROTIER_ONE_CHK_SPECIFIC_NETWORKS; do
        [[ "$(zerotier-cli get ${network} status)" = "OK" ]] || exit 1
        interface=$(zerotier-cli get ${network} portDeviceName)
        routes=$(ip r | grep "dev ${interface}" | grep -cv "via")
        [[ ${routes} -lt 1 ]] && exit 1
    done

else # Check for Minimum Networks

    # count zerotier-associated direct routes
    routes=$(ip r | grep "dev zt" | grep -cv "via")

    # sense less than minimum
    [[ ${routes} -lt ${ZEROTIER_ONE_CHK_MIN_ROUTES_FOR_HEALTH} ]] && exit 1

fi

exit 0

Baseline:

$ DPS zerotier
NAMES      CREATED          STATUS                    SIZE
zerotier   29 seconds ago   Up 25 seconds (healthy)   1.16kB (virtual 14.4MB)

$ docker exec zerotier sh -c 'echo $ZEROTIER_ONE_CHK_SPECIFIC_NETWORKS'
9999888877776666 9999888877775555

$ docker exec zerotier zerotier-cli listnetworks
200 listnetworks <nwid> <name> <mac> <status> <type> <dev> <ZT assigned ips>
200 listnetworks 9999888877776666 My_ZeroTier 22:43:c0:8e:8d:1c OK PRIVATE ztr2qsmswx 10.244.211.241/16
200 listnetworks 9999888877775555 Test 5e:7e:1c:52:e3:71 OK PRIVATE ztc3qzoglu 10.242.211.241/16

$ ip r | grep zt | grep -v via
10.242.0.0/16 dev ztc3qzoglu proto kernel scope link src 10.242.211.241 
10.244.0.0/16 dev ztr2qsmswx proto kernel scope link src 10.244.211.241 

Interpretation:

  1. Container is running and healthy.
  2. Demonstrate ZEROTIER_ONE_CHK_SPECIFIC_NETWORKS variable is non-null.
  3. Show networks reporting OK status.
  4. Show one route per ZeroTier network.

Test:

$ sudo nmcli conn down ztc3qzoglu
Connection 'ztc3qzoglu' successfully deactivated (D-Bus active path: /org/freedesktop/NetworkManager/ActiveConnection/55)

$ ip r | grep zt | grep -v via
10.244.0.0/16 dev ztr2qsmswx proto kernel scope link src 10.244.211.241 

$ DPS zerotier
NAMES      CREATED         STATUS                     SIZE
zerotier   3 minutes ago   Up 3 minutes (unhealthy)   1.2kB (virtual 14.4MB)

$ docker exec zerotier zerotier-cli listnetworks
200 listnetworks <nwid> <name> <mac> <status> <type> <dev> <ZT assigned ips>
200 listnetworks 9999888877776666 My_ZeroTier 22:43:c0:8e:8d:1c OK PRIVATE ztr2qsmswx 10.244.211.241/16
200 listnetworks 9999888877775555 Test 5e:7e:1c:52:e3:71 OK PRIVATE ztc3qzoglu 10.242.211.241/16

Interpretation:

  1. Tear down an interface.
  2. Show routing table now only has one route.
  3. After a time, the container goes unhealthy.
  4. Show networks. Both still with OK status even though one is broken. If the script only checked for OK it would report healthy despite the missing interface.

Your latest push still has the lines including and after:

Check if ALL Networks are connected (Default - ZeroTier)

I am not going to endorse that structure. The additional variable is not needed.


The code you call 'problematic' is actually how the original ZeroTier docker does the health-checks which @zyclonite has already pointed out in the previous thread. I did not want to eliminate their code completely. Rather I wanted to build on their code. Hence I have kept their code as default behavior.

I take "previous thread" to be #33 and, more specifically:

Screenshot 2025-02-24 at 13 47 34

I take "their code" to mean the referenced lines.

First point. It is not you who would be "eliminating" code. The entrypoint.sh.release isn't actually adopted by this repo. It's 100% eliminated!

Second point. In the same comment, Lukas wrote:

i did often run into a situation where zerotier-cli status returns online but no network interface was created... this might be worth checking as well

This is exactly the problem I think we have been exploring.

That's why I'm trying to lead you towards checking routes. From all the exchanges we have had thus far, I have formed the impression, rightly or wrongly, that you have a kind of mental block when it comes to the word "route".

A route is simply a convenient "pinnacle artifact". It is the last thing added, by the host, to the host's routing table, when the host regards an interface as defined, UP and serviceable.

Even in the dumbest host with exactly one network interface (eg Ethernet or WiFi), you will typically have at least two routes in the routing table:

  • A route saying that the subnet to which the host is connected can be reached via the Layer Two interface;
  • A default route saying that everything else can be reached via the same Layer Two interface.

The withdrawal of a route, by the host, doesn't tell you what is wrong, only that something is wrong. Route withdrawal is sensitive and reasonably fast. It's ideal for this kind of check.

@gb-123-git
Copy link
Author

With all possible respect, I'd simply say that if you find it troublesome, upsetting or impolite to be told that the writer isn't convinced that you understand something, the best solution is to demonstrate understanding.

The point here was not the writer being convinced or not; nor did I take offence. My point was simply that being more polite and a better human makes it a better world. It was just a general advise given to you to avoid using accusatory phrases like 'you actually understand how this code will behave' ; 'you are being disingenious' ; 'you have a kind of mental block'
I have written the code and I know what the code is about and also you should also appreciate that ALL the contributors (which now also includes me) have taken their time to write the code. Its very sad to know you are unable to comprehend this.

Personally, it feels a bit off to be embedding sponsorship messages in someone else's repo but maybe Lukas won't mind.

Well if someone has contributed to this code in some way (direct or indirect) should they not be recognized ? Would you mind Lukas writing 'This repo has been coded and maintained by Lukas' ?
They are not asking you for money for the code or license fee. Its a simple recognition for their contribution. I believe ALL contributors (both direct and indirect) should be recognized.

Second point. In the same comment, Lukas wrote:

i did often run into a situation where zerotier-cli status returns online but no network interface was created... this might be worth checking as well

This is exactly the problem I think we have been exploring.

I think there is a mix-up in understanding here. Lukas is taking about zerotier-cli status which checks the 'Client' status. We are using zerotier-cli listnetworks. Both commands are different and do different things.


Now Regarding Routes :

Although I am still not convinced that these are required, I have added the route checks.
The reason for not adding was simple, generally Zerotier manages the Routes and I did not forsee anyone forcefully deleting the routes using nmcli.
Although I have not gone into depth of all the ZeroTier Code, I was under the impression (which I may be wrong) that the 'OK' result was returned once ZeroTier checked the network connection which is not possible if there are no routes.
The 'healthy' you are seeing after disrupting the connection manually may also be due to :

  1. ZeroTier may not expect that someone will manually disconnect the interface using nmcli for the connections ZeroTier is managing.
  2. ZeroTier does not check the connection before a certain interval after first report of 'OK' and the interval of checking may be larger than 3-4 seconds that we are assuming. (Although it may be possible that it does not check again since the network is supposed to be managed by ZeroTier and not supposed to be disconnected by user.)

Nonetheless, I have added Route check as an additional measure;


For other things:
@zyclonite can take a call.

Paraphraser added a commit to Paraphraser/zerotier-docker that referenced this pull request Mar 3, 2025
This PR follows on from the extensive discussion associated with zyclonite#37.

Never before have I even *contemplated* submitting a PR covering the
same ground as an existing open PR. However, on this occasion I thought
it might be useful to have a concrete proposal to compare and contrast
with zyclonite#37.

I sincerely hope that laying this on the (virtual) table and then
minimising further interaction *might* help us converge on a solution.

<hr>

Changes:

* `docker-compose.yml` and `docker-compose-router.yml`:

	- replaces deprecated `version` statement with `---`.

	- adds example environment variables.

* `Dockerfile`

	- corrects case of "as" to "AS" (silences build warning).

	- adds and configures `healthcheck.sh` (as per zyclonite#37).

	- includes `tzdata` package (moved from `Dockerfile.router`) so
	  messages have local timestamps.

* `Dockerfile.router`

	- removes `tzdata` (moved to `Dockerfile`).

* `entrypoint-router.sh`:

	- code for first launch auto join of listed networks expanded to
	  include additional help material.

* `entrypoint.sh`:

	- "first launch" auto join of listed networks (code copied from
	  `entrypoint-router.sh`, as modified per above).

	- "self repair" of permissions in persistent store (code copied from
	  `entrypoint-router.sh`).

	- adds launch-time message to make it clear that the client is
	  launching (complements messages in `entrypoint-router.sh`).

	- abstracts some common strings to environment variables
	  (opportunistic change).

* `README.md`:

	- updates examples.

	- describes new environment variables (including move of
	  `ZEROTIER_ONE_NETWORK_IDS` from `README-router.md`.

	- documents health-checking.

* `README-router.md`

	- updates examples.
	- explains relationship of router and client.

Added:

* `healthcheck.sh`, based on original proposal in zyclonite#37 and subsequent
  suggestions for modification by me.

I gave serious consideration to the code for synchronising networks in
the entry point scripts. The idea is quite attractive. It is safe to
automate joins in a "clean slate" situation. However, a *leave* followed
by a *join* is not guaranteed to be idempotent. That's because the
*leave* destroys the network-specific configuration options
(`allowManaged`, `allowGlobal`, `allowDefault`, `allowDNS`).

On balance I think it's better left to users to send explicit *leave*
commands via the CLI and take responsibility for restoring lost
configuration options on any subsequent *join*.

I will post the results of testing this PR separately.

Signed-off-by: Phill Kelley <[email protected]>
@Paraphraser
Copy link
Contributor

If an owner of any repo entered into a relationship with a sponsor and wanted to make that relationship clear by adding comments or messages to the code, that's their decision and they go into it in full knowledge of any Intellectual Property implications.

It's slightly different when a second party proposes adding code which acknowledges a sponsorship relationship between the second party and a third party. That's the kind of thing that can undermine licence conditions. If Lukas applies such a PR, he implicitly accepts that his repo becomes a kind of MIT-with-possible-strings licence.

That said, rather than risk inflaming the situation further, I'm going to avoid posting any more comments against this PR. I apologise for my impoliteness.

I have created PR #38 as an alternative approach. At this stage, and because of its evolutionary path through this PR, healthcheck.sh does preserve your "sponsored by" comment. I will, however, remove it on request.

@gb-123-git
Copy link
Author

That's the kind of thing that can undermine license conditions. If Lukas applies such a PR, he implicitly accepts that his repo becomes a kind of MIT-with-possible-strings license.

I would like to humbly disagree with you. The settled law is : if there were any 'strings attached' the same needs to be mentioned clearly in the license agreement; the reference of which has to be made in the file. If nothing is mentioned, the original license policy is followed which in this case is MIT.
It is the same thing is writing 'This section of code is written by xyz developer' to acknowledge his/her contribution. Just writing a developer's or sponsor's name does NOT undermine any license condition; nor it gives them any right over the code (specially when it has been submitted to an open source repository with MIT license).

Further I can confirm there are no strings attached with this code. It is simply a recognition of their contribution.

@hoppke
Copy link

hoppke commented Mar 3, 2025

Further I can confirm there are no strings attached with this code. It is simply a recognition of their contribution.

That would be up to "PMGA TECH LLP" to declare though, in a way that'd hold against professional scrutiny. I'm an external observer and right now it's not clear to me how the script was 'sponsored' (is it the company's IP? What are the terms/license? Are they compatible with what this repo is using and will it remain so indefinitely? Is "PMGA TECH LLP" aware this is happening and is it in line with whatever policy they have on OSS contributions? Or was this maybe more like an OSS grant and the author is independent?

At the moment I'm not even sure which parts of the PR come from an individual, which are 'sponsored' by an LLP etc.

Also, projects like this one are tempting targets for supply chain attacks (recall xz utils & ssh backdoor). Is "PMGA TECH LLP" reputable enough to associate with? Is it an established business, who's behind it? What other activities does it engage in?

Recognition is of course deserved, and it would go into git history, perhaps into release blurbs, maybe a dedicated CONTRIBUTORS.md etc. A number of tried-and-tested options to choose from.

But it should not go into anything 'executable' if it can be avoided. Not to obscure the credit, but to make the security-sensitive parts easier to review/reason about. It's a pain when you see a shebang deferred by a comment block, and now need to triple-check there's no unicode shenanigan lurking in there, or that this isn't a new kind of exploit against a particular busybox version...

@gb-123-git
Copy link
Author

@hoppke

is it the company's IP? What are the terms/license?

NO. I think you missed the point where I clearly declared there are no license which means it is same as the repo i.e MIT.

It's a pain when you see a shebang deferred by a comment block, and now need to triple-check there's no unicode shenanigan lurking in there, or that this isn't a new kind of exploit against a particular busybox version.

LOL. A comment can 'exploit' a code. Let me tell you this - even putting shebang character in the first 1-2 bytes can still lead to Homoglyph Attacks. Plus in case you are not aware, you should always run shellcheck. Once committed to git, then it is impossible to change comment and add unicode characters.
I have nothing further to say.

It's really surprising and disheartening to see people who hardly have 1-2 contributions a year (that too 0 contribution in this repository) go on writing a whole baseless paragraph citing 'legal' issues without having an iota of knowledge of law.

To further clear the air and be straight & direct, this repo was pitched to PMGA Tech LLP, and since it needed health-check, I made the script and they allowed me to use 1/2 my consultation time with them to build this. i.e. 1/2 time I used from my personal time and 1/2 from their consultation time. In return they just asked me to put the sponsor message which is a fair ask. I don't know why people jump to conclusions when someone else gets recognition for their honest work! PMGA Tech LLP could have had me not post anything for you to use if they wanted to! This has been done only because we were using this repository and wanted to contribute something to it in return ! And you have a problem with that ? It's one thing to leech and not contribute back, but its whole level when one only leeches out of this repository and put questions on recognition of someone's legitimate contribution!

maybe a dedicated CONTRIBUTORS.md etc.

I would agree to this. But strictly legally speaking it would have the same effect as writing it in the code; which is nada since the REPO is MIT and the people contributing to it are doing so with that knowledge and voluntarily.
I am infact happy if Lukas can make CONTRIBUTORS.md and leave the recognition out of the specific file. All I am advocating is recognition should be given where it is due.

@hoppke Sir, In my humble opinion, you should contribute first so that you understand the hard-work that goes into coding and testing. Your time would have been better spent contributing rather than trying to raise unnecessary and baseless questions.

Anyway, I think this is as far as I can go.

@zyclonite can decide what he wants to do.

@hoppke
Copy link

hoppke commented Mar 4, 2025

@hoppke
LOL. A comment can 'exploit' a code.

An exploit could be made to look like an innocent comment, sure.

So e.g. in your script it looks like a shebang. @Paraphraser beat me to it and already pointed out it isn't right. So I'll expand a bit.
The way you designed it it's a no-op, a decoy. Linux (the kernel) will only inspect first 2 bytes of the file, and unless it finds a '#!' shebang handling logic will not trigger. The script runs through whatever the overarching OS thinks is appropriate for a shebang-less, non-binary executable.

It'll likely be the default shell, so busybox on a pristine alpine image, but there are probably no promises that it'll run in the same compatibility mode as /bin/sh (myself, I'd bet on it selecting /bin/ash). Which may or may not be running the same POSIX compliance level, and thus may or may not be an issue, but it may of course change in the future even if no one touches the script again, because it wouldn't be a first for busybox/alpine.

If only we could have a way for a script to precisely declare the shell it wants to execute under... Maybe one day.

It's really surprising and disheartening to see people who hardly have 1-2 contributions a year (that too 0 contribution in this repository) go on writing a whole baseless paragraph citing 'legal' issues without having an iota of knowledge of law.

On the subject of 'surprising', is the "PMGA Tech LLP" ~5-6 years old, registered in India, declaring two native founders, only one on them on linkedin, no trace of employees, contractors or anything resembling a business history there, nor on google. The only real content from this company I got out of google is a "company web page" in the .tech domain, it's "in progress" and mostly a contact FORM. Curiously it mentions no names, phone numbers, office addresses, not even the country it's operating in. No list of clients, no portfolio. No identifiable info.

I've seen IT companies 'giving back to OSS', from both ends, but never saw it done by an IT house running like this, in full stealth mode.
To me that's surprising.

And it decides to sponsor changes in a VPN gateway, of all the possible benefactors!

@Paraphraser
Copy link
Contributor

@hoppke

to make the security-sensitive parts easier to review/reason about

That's an aspect I hadn't considered. It's a very good point. I was only looking at the best practice aspect of the signature bytes, plus the potential implied IP claim as a secondary issue.

Your remark about the parallels with xz struck a chord. I thought the antagonism was either some unfortunate product of my own turn of phrase or explained by a cultural difference. Now it seems apparent that it's triggered by any difference of opinion.

Like you, all of this caused me to Google the organisation. The results did not alleviate my disquiet.

Another thing I noticed is summarised here:

signing

"A" is the initial batch of commits, which were signed, while "B" is the more recent commits, which were unsigned. That speaks to commitment to process, and does nothing to alleviate my disquiet.

"C" is the result of clicking on one of the "Verified" buttons. Now, compare/contrast that with "D" which is the result of doing the same thing with one of my commits.

In the "D" case, GitHub is able to tie the commit and the digital signature back to my account. In the "C" case, GitHub was unable to do that. Now, why should that be so?

From a clone with PR37 applied:

$ git log -1 --show-signature 53789fac8a3e496a458ab1b0546315b63a82fe0c
commit 53789fac8a3e496a458ab1b0546315b63a82fe0c
gpg: Signature made Thu Feb 20 01:50:44 2025 AEDT
gpg:                using RSA key B5690EEEBB952194
gpg: Good signature from "GitHub <[email protected]>" [unknown]
gpg: WARNING: This key is not certified with a trusted signature!
gpg:          There is no indication that the signature belongs to the owner.
Primary key fingerprint: 9684 79A1 AFF9 27E3 7D1A  566B B569 0EEE BB95 2194
Author: gb-123-git <[email protected]>
Date:   Wed Feb 19 20:20:43 2025 +0530

Compare/contrast with a clone where PR38 has been applied:

$ git log -1 --show-signature
commit a8a22fe466a03b206e770b74d2efd1b2f1fdece3 (HEAD -> 20250303-healthcheck-main, origin/20250303-healthcheck-main)
gpg: Signature made Mon Mar  3 12:34:02 2025 AEDT
gpg:                using RSA key DFF37C5ABCA24C7FADE7F32073D35B58592A2E98
gpg: Good signature from "Phill Kelley (Paraphraser) <[email protected]>" [ultimate]
gpg:                 aka "Phill Kelley (Paraphraser) <[email protected]>" [ultimate]
gpg:                 aka "Phill Kelley (Paraphraser) <[email protected]>" [ultimate]
gpg:                 aka "Phill Kelley (Paraphraser) <[email protected]>" [ultimate]
gpg:                 aka "Phill Kelley (Paraphraser) <[email protected]>" [ultimate]
Primary key fingerprint: A99E 370F AC77 ED2B B0AD  0385 0C7D 6CF5 5F61 8D9A
     Subkey fingerprint: DFF3 7C5A BCA2 4C7F ADE7  F320 73D3 5B58 592A 2E98
Author: Phill Kelley <[email protected]>
Date:   Mon Mar 3 12:33:58 2025 +1100

In my case, my public key has the necessary additional UIDs tying my public key to the email addresses listed in my GitHub profile. GitHub can go from the digital signature, to the email addresses in the public key, to the email addresses in the profile, and then display my account name ("Paraphraser") as a hyperlink to my profile.

I'm not sure what the goal is of embedding [email protected] in a public key but it defeats all that and also does not alleviate my disquiet. What you say about Stealth Mode may not be far off the mark.

In my case, "Paraphraser" was an accident of history which subsequently became akin to a kind of brand name. I use it here, on Discord, on Mastodon and so on. But I don't go to extreme lengths to hide my real name because to do so serves no practical purpose that I can think of.

It's really surprising and disheartening to see people who hardly have 1-2 contributions a year ...

On behalf of humanity I'd like to apologise for that remark. Each of us can only do what we can, and only when we can. Everyone has a first comment, or a first issue, or a first commit, or a first repo.

@gb-123-git

if there were any 'strings attached' the same needs to be mentioned clearly in the license agreement

I think it might depend on the jurisdiction. I don't know about India but where I am (Australia) a sponsorship may be considered a "work for hire" in which case the IP rights belong to the sponsor unless there's a written agreement to the contrary. The insertion of a "sponsored by" may be seen as establishing a claim, in the same way as a "written by".

I also wonder if you've forgotten that your initial commit included these lines:

#This health-check script is sponsored by PMGA TECH LLP
#The above line is a part of license to use this code. Removal of line shall be deemed as revoking of usage rights.
#!/bin/sh

I know you removed the second line a day later but it still speaks to your original intention, and it's all still recorded on GitHub. Even though you were the one who removed the second line, that removal might be considered to have triggered the revocation.

My personal philosophy is to acknowledge my sources and not simply because it's a sound ethical practice but because it lends credibility to my own work. That's actually why I left the "sponsored by" line in PR #38, even though the script I'm proposing bears very little relationship to the original.

I've been thinking that it might be more appropriate to replace that line with something like:

# acknowledgement: some ideas sourced from PR 38 and Issue 33

dedicated CONTRIBUTORS.md

I disagree. It seems to me that GitHub already does a very good job of tracking contributions, and it looks to me like you're well aware of that.

I think if I were in the situation where someone was sponsoring my activities, I'd probably just mention on my GitHub profile page that some of my work was sponsored by organisation X and leave it at that.

@gb-123-git
Copy link
Author

gb-123-git commented Mar 4, 2025

@hoppke

The only real content from this company I got out of google is a "company web page" in the .tech domain, it's "in progress" and mostly a contact FORM. Curiously it mentions no names, phone numbers, office addresses, not even the country it's operating in. No list of clients, no portfolio. No identifiable info.
I've seen IT companies 'giving back to OSS', from both ends, but never saw it done by an IT house running like this, in full stealth mode.

So the criteria for giving something back to the OSS community is to first become a company with loads of employees and become known to the whole world like Google/Apple, share client names, portfolios etc. and only then the hard work and contribution will be recognized ?

And it decides to sponsor changes in a VPN gateway, of all the possible benefactors!

Can't the company use VPN to give remote contractors a secure access to its servers ?

"PMGA Tech LLP" ~5-6 years old, registered in India, declaring two native founders, only one on them on linkedin, no trace of employees, contractors or anything resembling a business history there, nor on google.

Is an entity having 'registered in INDIA' or not using 'linkedin' (which is a private offering from microsoft) a justification for discrediting the entity ? Shall all the business deals be posted on their site or maybe here ? Is that also mandatory for submitting a code to OSS community ?
I see no name associated with @hoppke . Google search also pretty much doesn't return a specific person. Only an image with an animal (what seems like a Prairie dogs (genus Cynomys)). What should I infer from that ?
But then again, since you anyways don't contribute, it doesn't matter does it ?
Giving your details on the web only matters when you try to write a code and submit it to the OSS community right ?

@Paraphraser

I think it might depend on the jurisdiction. I don't know about India but where I am (Australia) a sponsorship may be considered a "work for hire" in which case the IP rights belong to the sponsor unless there's a written agreement to the contrary. The insertion of a "sponsored by" may be seen as establishing a claim, in the same way as a "written by".

Sure, I have changed the text to be very descriptive now.

I know you removed the second line a day later but it still speaks to your original intention, and it's all still recorded on GitHub. Even though you were the one who removed the second line, that removal might be considered to have triggered the revocation

In that case you will be personally liable for using the code (or parts of it) in your PR #38; in case you decide to use it without the licensing terms; if you believe the revocation is triggered.

My intent has not changed; the 'Original' and 'Final' intent is to
1. Help the OSS community
2. Give recognition where it is due.

My personal philosophy is to acknowledge my sources and not simply because it's a sound ethical practice but because it lends credibility to my own work.

Everyone has a right to their own beliefs. Ethics mean a lot to me and that is the reason I am fighting for being ethically and morally right to recognize someone's contribution.

@hoppke @Paraphraser

Since this post is now becoming nothing but an attempt to discredit the contributors, I would stop replying here. I hereby specifically state that my failure to reply shall not construe to be any agreement/disagreement to any of the posts explicitly not replied to.

My terms are pretty much simple. If you use the code, please recognize the contribution. That is ALL which is being asked.
If you are skeptical about the code or PMGA Tech LLP, simply do not use the code.

Spell-Checked Licensing Text
@Paraphraser
Copy link
Contributor

So the criteria for giving something back to the OSS community is to first become a company with loads of employees and become known to the whole world like Google/Apple, share client names, portfolios etc. and only then the hard work and contribution will be recognized ?

I'd say that the criteria for "giving something back to the OSS community" are:

  1. The willingness, knowledge and capacity to do the work; and
  2. Garnering community acceptance, including that of the repo owners/maintainers.

I have my doubts about the first point but I definitely don't think you've succeeded on the second point.

You might want to reflect on the fact that two people independently but simultaneously reached the conclusion that there might have been more to your proposal than met the eye, and that some deeper research was in order.

I can't speak for @hoppke but the immediate trigger appears to have been concerns about supply-chain attacks. I note that you have only just addressed the problem of the signature bytes, which was something I first mentioned two weeks ago.

Aside from my growing general disquiet about the whole thing, the most immediate trigger for my research was noticing that some of your commits were signed, some not, wondering about that pattern, then noticing that you were using a public key that didn't link back to your profile. I thought that was odd.

I don't care two hoots about where someone is located. Everyone has to live somewhere. I would not have used a phrase like 'registered in INDIA' (your capital letters, not Hoppke's) myself. However, because it had already been raised, it seemed appropriate to mention that there might be legal differences between where you are and where I am.

I also don't care whether a contributor is an individual beavering away at home (like me) or employed by the largest company on the planet. While "lack of presence" (for want of a better expression) seems to have been a concern for Hoppke, it isn't a concern for me.

My concerns are grounded in the perpetual argumentativeness, unwillingness to take advice or adopt best practice, and general conduct which seems antithetical to garnering community acceptance for the proposal. The very anti-pattern, in fact, that Hoppke has mentioned in the xv saga.

Nobody here has gone nuts about "gb-123-git" as an account name. Why go nuts about "Hoppke" and carry on about Prairie Dogs? What purpose does it serve? It doesn't address Hoppke's concerns about the possibility of supply-chain attacks. Neither is "LOL" an appropriate response to a very clearly articulated and legitimate worry.

Sure, I have changed the text to be very descriptive now.

#!/bin/sh
#This health-check script is sponsored by PMGA TECH LLP
#Licensing Terms:
#This code is free to use, distribute, use as a derivative for another code or do any such activity as allowed by M.I.T License as long as the recognition stays intact.
#The recognition may be moved to the CONTRIBUTERS.md file in case the same is being maintained in the projects in which the code is being used.
#Removing the recognition text shall construe revocation of usage rights and thus would be deemed as a copyright infringement.
#The intent of this license is ONLY to give due recognition to the contributers and there is no other intent whatsoever.
# Contributers:
# PMGA Tech LLP
# gb-123-git

and in so doing, I think you have made the situation far worse than it was.

The essence of OSS is contributing to the common good. It's not about either big-noting yourself or your organisation, or sneaking what might be intellectual property time-bombs into code. It's about giving freely. As they say, "free" as in "beer".

The way you show "no strings attached" is by not attaching any strings. Zip. Zero. Zilch. Nada. If you can't do that then the appropriate course of action is to fork the repo and go off on your own. Nothing stops you from doing that.

In that case you will be personally liable for using the code (or parts of it) in your PR #38; in case you decide to use it without the licensing terms; if you believe the revocation is triggered.

Which is a concern.

Everyone has a right to their own beliefs. Ethics mean a lot to me and that is the reason I am fighting for being ethically and morally right to recognize someone's contribution.

And GitHub already does that for you. When you take an action on GitHub (issue, PR, whatever) it is recorded. Everyone can see who did what and when. All contributions are recognised, automatically. You are asking for more and are getting annoyed because other people foresee problems with your approach and ask why?

My terms are pretty much simple. If you use the code, please recognize the contribution. That is ALL which is being asked. If you are skeptical about the code or PMGA Tech LLP, simply do not use the code.

There you go again. Your terms.

However, we are in agreement. At this point I withdraw my support for any form of health-checking and recommend to @zyclonite that both #37 and #38 be rejected and closed.

Apart from anything else, it will save us having to try to figure out why buildah is out-of-date and whether that's amenable to being fixed.

@hoppke
Copy link

hoppke commented Mar 5, 2025

As a consumer of the project I would also advise against merging it.

The IP in the PR allegedly stems from a collab between an LLP and the PR's author, created (to an unspecified extent) during a professional engagement of the two. The details are unclear, it's not clear what the PR author's legal rights and obligations were in that engagement (employee? direct subcontractor? b2b via a third party?), and it's not clear which lines in the PR are attributed to which entity, which jurisdictions they fall under etc.

The LLP entity was not introduced in a way that could be verified, we've no authentication for it, no handshake with whoever's allowed to release IP from the LLPs end. It could come up and deny involvement/knowledge/consent at any point.

The LLP has a very low online profile so it's not clear what the business model is.

The PR's author, when challenged, becomes defensive/dismissive. The author's relation to the LLP is unclear (could be one of the owners, could be an employee, could be subcontracting, so when making statements about T&Cs or IP ownership we need to treat them as 2nd hand personal opinions, not legally binding.

As @zyclonite spotted, there's weirdness, or even intentional obfuscation, in the PGP sig of the author. Similarly the LLPs homepage also does everything to offer no real identifiable info. The domain is 2-3 years old, signed by let's encrypt so it only confirms the domain is owned by the certificate holder, but not who the certificate holder is.

This could be just about plugging the name of a fake-it-till-you-make-it IT consultancy into some OSS projects on the internet, for rep-building purposes.

But zerotier-docker's position is sensitive, as it's a tempting target for supply chain attacks (technical OR legal, and this PR is IMO far from 'safe' in both respects), and including a PR like this may create opportunities for someone to hurt consumers of the project in some way.

So it's a gain vs risk assessment. Is a healthcheck script worth the risk for zerotier-docker?
IMO not.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants