Merge changes made by NGINXaaS #193

arussellf5 · 2025-07-17T16:20:27Z

This PR brings in all the changes made by the NGINXaaS for Azure team.

The primary intent behind this is to keep retrying updates which may be made before the controlplane has registered the existence of a named upstream in the customer's NGINX configuration.

NOTE: This commit was accidentally missed out in the first iteration of this fork and is present in upstream: https://github.com/nginxinc/nginx-loadbalancer-kubernetes/blob/main/internal/configuration/settings.go#L189 The code needs to move forward to start the watchers and health server (side note: we should also think about the ordering of these at some point) and not running the infomer in a go routine prevents the program from further execution until the context is canceled. In the current iteration on main, the controller is stuck on the informer, and then k8s kills the service and restarts it since the health server is not up.

The user specifies the ingress service whose events the application should watch through setting the "service-annotation-match" annotation on the application's config map. Only events with a matching service annotation will be passed by the watcher to the handlers. The informer now listens to events from all namespaces. This frees the end user from the restriction of only using the nginx ingress controller.

…eam name The port name should now be formatted like this: "http-tea", where the first part of the string is the context type (either "http" or "stream") and the second part of the string after the hyphen is the name of the upstream.

We need to be able to publish the operator to dockerhub in order to be publicly available for customers. Following what we have in ARP as a release strategy where a release tag action would publish the image to dockerhub.

go test does not have a good way to capture unit-test coverage as part of test runs. This commit captures the error code of the unit test run, runs the coverage generation and then exits based on the test status.

Helm will be used as part of the user story to deploy the operator but it is also a good tool to deploy the operator while developing it. This commit adds the ability to publish helm charts: - to the dev registry for local iteration. - to the regular devops registry for CI iteration and testing. This will also help us test the helm chart itself.

0.0.1 is okay but it's not really a bug fix.

We should keep a single release script that will publish docker images and helm charts for the official release. This commit just updates the current release script to handle both artifact types. Helm logic will follow.

Previously, owing to a bug, if the name of the upstream included hyphens it would be rejected by the operator.

We are going to release with `0.x.y` and keeping the internal version inline with what will get published out will reduce confusion.

This repository is going to produce artifacts that will be available publicly and the end users will care about semantic versioning. We need to be able to map a public facing version to internally produced artifacts easily and having semver internally eases that work. This commit does not enforce the versioning but adds a version file that has the semver, which will be used to version the product. We can follow a workflow where during release time, we cut a release, which creates a tag and we retag existing dev artifacts to be shipped as an official artifact.

While a cosmetic change, it does impact how dokerhub repo needs to be setup to publish the helm chart. In addition to that, it impacts what the user see on k8s itself and it should not be nginx-loadbalancer-kubernetes as that is confusing.

While the chartname (which was renamed in the prior commit) gets used for naming stuff, it makes sense to also change the name value in the chart itself to use the new name.

Now that official images exist on docker hub, we should use those images in our charts.

Now that the chart has been renamed, gitignore needs to know about ignoring new charts.

This lets us test the operator from the private registry. Also, in case a customer does not want to pull from docker hub and instead use their own registry, they can do so and specify a pull secret for the image.

With the change to make the operator read a config file (to reduce code complexity), we need to make sure that the file exists for the service to start.

NLK listens for changes to any kubernetes service of type LoadBalancer and updates the NGINXaaS upstreams with the external IPs and ports of the LB service. We are using the external IPs and not the node IPs so as to provide seamless plug-and-play functionality with the customer's networking framework.

This will help fix vulnerabilities discoved by mend in k8s.io/apiMAChinery-v0.26.0.

This enables concurrency-safe access by multiple goroutines.

The latest versions of the kubernetes libraries recommend using a typed workqueue and this reduces a bit of boilerplate and error handling, because we no longer have to cast the workitems returned by the queue into the desired types.

Multiple parallel tests were all accessing the same pointer to a single variable for the DefaultTransport in the http package. This was leading to data races in unit tests.

There's a syntax issue in the "listen" directive. Should be "listen 9113;", not "listen 9113:". Using the current file, I got an error upon reloading the NGINX service ("nginx: [emerg] invalid port in "9113:" of the "listen" directive in /etc/nginx/conf.d/prometheus.conf:11")

There was a change in the API for the NGINX Plus Client that was missed when updating to the latest version. This corrects that.

These functions were being used to create IDs that were not really necessary for the business logic and which were generating security alerts because of weak cryptography techniques.

The http client is processing requests created by the nginx plus client library, and that library should always include a sensible number of headers. But the lack of change on the number of headers was causing security vulnerability flags to be raised over denial of service resource exhaustion attacks.

Go cache in the CI is seeded in the project working directory. We should skip the mod cache from lint/formatting as it's upstream code and there are high chances of the linting failing as upstream lint rules != our lint rules.

…inx hosts In order for the nginx-hosts yaml field to be parsed correctly by viper the template needs to: 1. not put double quotes around the value (this causes viper to interpret it as a string) 2. render it as a JSON array rather than a go representation of a slice.

The biggest change here is to remove most the TLS modes to enable mTLS and self-signed certificates. Product decided that this was too complex and there was not enough user demand for most of these options. We decided to pare down the code and remove tests that were no longer well maintained. The remaining configuration allows users to toggle a single switch: whether to make the http client verify the NGINX host's certificate chain and host name if https is being used. If the user wishes to enable https with self-signed certs they can use the "skip-verify-tls" setting to allow this. The default behavior is to perform this verification. We are maintaining the deprecated "no-tls" and "ca-tls" inputs for NGINXaaS backwards comptability reasons. The "no-tls" setting name was highly misleading, because all it did was disable TLS verification: it DID NOT disable TLS altogether in https mode. Similarly, the "ca-tls" setting did not enable TLS itself. TLS is enabled by default when the URL of the NGINX host includes the https protocol. The user setting merely enforced the verification of the certificate chain and host as described above.

Now that the plus go client allows users to check the http status code of the error, handle the upstream not found case by doing nothing.

This is a go anti-pattern

arussellf5 requested review from ciroque and chrisakker as code owners July 17, 2025 16:20

puneetsarna and others added 28 commits July 17, 2025 17:28

Seed repo

64a9b9d

Remove Github-y stuff

5200cca

Format code

9f83128

Lint code

1a95954

Update go version and deps

a5bcb38

Disable exhaustruct linter for now

d58089f

NLB-4655 NLK will retry a work item to update upstreams indefinitely

0f8dbc7

The primary intent behind this is to keep retrying updates which may be made before the controlplane has registered the existence of a named upstream in the customer's NGINX configuration.

Update binary/docker img to nginxaas-operator

a4dfe4f

NLB-5282: Allow images to be pushed to Dockerhub

cb49988

We need to be able to publish the operator to dockerhub in order to be publicly available for customers. Following what we have in ARP as a release strategy where a release tag action would publish the image to dockerhub.

Remove unneeded file

afd9aa3

Capture code coverage

692ad94

go test does not have a good way to capture unit-test coverage as part of test runs. This commit captures the error code of the unit test run, runs the coverage generation and then exits based on the test status.

NLB-5360 Upgraded nginx-plus client to v 1.2.2

a64e5c8

NLB-5065 Operator adds API Key to header

cb78349

Update Chart version to be 0.1.0

438be42

0.0.1 is okay but it's not really a bug fix.

Update release script to handle dual artifacts

ab3ffb9

We should keep a single release script that will publish docker images and helm charts for the official release. This commit just updates the current release script to handle both artifact types. Helm logic will follow.

NLB-5549 Translator allows hyphens in upstream name

024ed4e

Previously, owing to a bug, if the name of the upstream included hyphens it would be rejected by the operator.

Keep major version to 0

a9f024e

We are going to release with `0.x.y` and keeping the internal version inline with what will get published out will reduce confusion.

Rename the chart to nginxaas-operator

8f6ca53

While a cosmetic change, it does impact how dokerhub repo needs to be setup to publish the helm chart. In addition to that, it impacts what the user see on k8s itself and it should not be nginx-loadbalancer-kubernetes as that is confusing.

Update name to be operator

fa944bd

While the chartname (which was renamed in the prior commit) gets used for naming stuff, it makes sense to also change the name value in the chart itself to use the new name.

Update registry paths to dockerhub

8f47223

Now that official images exist on docker hub, we should use those images in our charts.

Update gitignore with new chart name

04268e7

Now that the chart has been renamed, gitignore needs to know about ignoring new charts.

Add support for image pull secrets

5d03f43

This lets us test the operator from the private registry. Also, in case a customer does not want to pull from docker hub and instead use their own registry, they can do so and specify a pull secret for the image.

Mount configmap as a volume

dbce0b1

With the change to make the operator read a config file (to reduce code complexity), we need to make sure that the file exists for the service to start.

arussellf5 and others added 26 commits July 17, 2025 17:31

NLB-6320 Bumped to version 1.1.0

40f2dc8

NLB-6293 Updated k8s libraries used by NLK

7613acd

This will help fix vulnerabilities discoved by mend in k8s.io/apiMAChinery-v0.26.0.

NLB-6293 Added mutex lock around certification's Certificates type

683f719

This enables concurrency-safe access by multiple goroutines.

NLB-6293 Upgraded golangci-lint to v1.64.5 and fixed configuration

531d260

NLB-6294 Bumped version to 1.1.1

796beb8

NLB-6294 NewTransport clones the default http.DefaultTransport variable

4cdeb58

Multiple parallel tests were all accessing the same pointer to a single variable for the DefaultTransport in the http package. This was leading to data races in unit tests.

prometheus config update (nginxinc#178)

bd9a3dc

update prometheus files (nginxinc#179)

ec41eb1

fix typo (nginxinc#181)

90c99d0

Corrects the NGINX Plus Client interface

d3a9b76

There was a change in the API for the NGINX Plus Client that was missed when updating to the latest version. This corrects that.

Removed synhronization's random functions

3aff5d6

These functions were being used to create IDs that were not really necessary for the business logic and which were generating security alerts because of weak cryptography techniques.

Upgraded go to 1.23.8 and golang.org/x/sync to v0.12.0

a39e2db

Bumped version to 1.1.2

217dc34

Skip go-cache while linting

8109c36

Go cache in the CI is seeded in the project working directory. We should skip the mod cache from lint/formatting as it's upstream code and there are high chances of the linting failing as upstream lint rules != our lint rules.

unit test flake fix and go version upgrade to 1.24.3

89ab9a9

NLB-6295 Bumped version to 1.2.0

ab38863

Updated go version to 1.24.4

e431b7b

NLB-6754 When deleting upstream servers handle upstream not found error

d9b8e1e

Now that the plus go client allows users to check the http status code of the error, handle the upstream not found case by doing nothing.

NLB-6754 Bumped version to 1.2.1

81184f5

Removed context as a field within the nginx stream border client

24a3c17

This is a go anti-pattern

arussellf5 force-pushed the nginxaas-merge-devops branch from 36ec676 to 24a3c17 Compare July 17, 2025 16:32

arussellf5 closed this Jul 18, 2025

arussellf5 deleted the nginxaas-merge-devops branch July 18, 2025 09:01

arussellf5 restored the nginxaas-merge-devops branch July 18, 2025 09:06

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Merge changes made by NGINXaaS #193

Merge changes made by NGINXaaS #193

arussellf5 commented Jul 17, 2025

Uh oh!

Uh oh!

Merge changes made by NGINXaaS #193

Merge changes made by NGINXaaS #193

Conversation

arussellf5 commented Jul 17, 2025

Uh oh!

Uh oh!