Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .dockerignore
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,7 @@ registry-scanner/hack
registry-scanner/test
scripts/
test/e2e
test/ginkgo
test/testdata
test/utils
*.iml
Expand Down
155 changes: 155 additions & 0 deletions .github/workflows/ci-tests.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -156,3 +156,158 @@ jobs:
with:
token: ${{ secrets.CODECOV_TOKEN }}
files: coverage.out
test-e2e:
name: Run end-to-end tests
runs-on: ubuntu-latest
strategy:
matrix:
k3s-version: [ v1.27.1 ]
# k3s-version: [v1.20.2, v1.19.2, v1.18.9, v1.17.11, v1.16.15]
steps:
- name: Install K3D
run: |
set -x
curl -s https://raw.githubusercontent.com/rancher/k3d/main/install.sh | bash
sudo mkdir -p $HOME/.kube && sudo chown -R runner $HOME/.kube
k3d cluster create --servers 3 --image rancher/k3s:${{ matrix.k3s-version }}-k3s1
kubectl version
k3d version
Comment on lines +159 to +174
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

# Find and read the CI workflow file
fd -t f "ci-tests.yaml" .github/workflows/

Repository: argoproj-labs/argocd-image-updater

Length of output: 212


🏁 Script executed:

# Read the full test-e2e job to understand current configuration
cat -n .github/workflows/ci-tests.yaml | sed -n '159,310p'

Repository: argoproj-labs/argocd-image-updater

Length of output: 212


🏁 Script executed:

# Check for any Makefile with test-e2e-ci target to verify operator startup time claim
fd -t f "Makefile" | head -5

Repository: argoproj-labs/argocd-image-updater

Length of output: 212


🏁 Script executed:

# Search for test-e2e-ci target in Makefiles
rg "test-e2e-ci" --type makefile

Repository: argoproj-labs/argocd-image-updater

Length of output: 212


🌐 Web query:

k3d cluster creation timeout k3s CI environment typical duration

💡 Result:

Typical k3d cluster creation times in CI

  • Typical: single-node clusters often come up in seconds → ~5–30s; small multi-node clusters commonly finish within 30s–2m on decent CI runners. [1][2]
  • Worst / CI-safe: depending on image pulls, loadbalancer, port mappings and slow runners, creation can take several minutes; many CI configs use timeouts of 5m–6m (or higher for larger clusters). [2][3]

Practical recommendation for CI

  • Use --wait and set --timeout to at least 5m (300s); bump to 6m+ or 360s+ for multi-server/multi-agent clusters or slow runners. [1][3]

Sources

  • k3d docs (cluster create, --wait/--timeout). [1]
  • Arrikto blog: example measurements (3‑node cluster ≈11s on an old laptop; uses --timeout=5m in CI examples). [2]
  • Example k3d config / community repos showing --timeout set to 360s for multi-server clusters. [3]

🌐 Web query:

k3d install script curl error handling CI workflow best practices

💡 Result:

Here are concise, CI-focused best practices for installing k3d with curl (and for handling curl-based install scripts generally), with concrete safety and reliability recommendations.

Quick summary (short checklist)

  • Prefer packaged installers or actions (Homebrew, Chocolatey, go install, or a maintained GitHub Action) over piping install scripts. [1] (github.com)
  • If you must fetch a script: download to a file, verify integrity (checksum or GPG), inspect, then execute. Never blindly curl | bash in CI. [3][2] (checksum.sh)
  • Pin a specific k3d version (TAG or exact binary) instead of “latest”. Cache the binary across runs. [1][4] (github.com)
  • Make installs fail loudly and early: set -euo pipefail, enable -x for debug on failures, and check exit codes. [5] (protsenko.dev)
  • Add retry, timeout and TLS checks to network downloads (curl --retry, --max-time, --proto '=https' --tlsv1.2 --fail). Log full response on failure. [2] (curl.se)
  • Verify CI runner prerequisites (Docker version, privileged mode if needed, required kernel/runc) and skip/soft-fail tests if unmet. [1] (k3d.io)
  • Run installers inside ephemeral, minimal build images or dedicated setup steps (not in production agents). Use least-privilege user where possible and remove artifacts after. [5][2] (protsenko.dev)
  • Audit and pin upstream sources (commit hashes, signed releases); prefer signed checksums from a separate channel (release page + GPG). Record used URLs/hashes in CI logs/artifacts for reproducibility. [4][3] (securing.pl)

Concrete examples / patterns

  • Safer pattern (download + verify + run):

    1. curl -fsSL -o /tmp/k3d-install.sh "https://.../install.sh"
    2. curl -fsSL -o /tmp/k3d-install.sh.sha256 "https://.../install.sh.sha256"
    3. echo "$(cat /tmp/k3d-install.sh.sha256) /tmp/k3d-install.sh" | sha256sum -c -
    4. bash /tmp/k3d-install.sh (only if checksum OK)
      (Use GPG verification if project publishes signatures.) [3][2] (checksum.sh)
  • CI config tips

    • Pin TAG env var: TAG=v5.x.y when invoking k3d install script or download a release binary rather than the install.sh. Cache /usr/local/bin/k3d between runs. [1] (github.com)
    • Use GitHub Actions marketplace action for k3d (or official/community action) when available to avoid ad-hoc scripting. [1] (k3d.io)
    • Add small integration test step that verifies k3d --version and that Docker can create containers; fail early if missing. [1] (k3d.io)

Security & reliability rationale (short)

  • Piping remote scripts to a shell runs unreviewed code; network or upstream compromise can inject malware. Verify integrity and provenance to prevent supply-chain attacks. [2][4] (curl.se)
  • Download+verify + pinned versions give reproducibility in CI and make audits possible. Use separate channels for checksum/signature to reduce MITM risk. [3][4] (checksum.sh)

References
[1] k3d install options, prefer packaged/binary and how to pin TAG. (github.com)
[2] curl security/known risks and recommended TLS/options and fail behavior. (curl.se)
[3] checksum.sh / pattern for verifying install scripts before piping to shell. (checksum.sh)
[4] Security analysis on external dependencies in CI and why pin+verify matters. (securing.pl)
[5] Practical Dockerfile/CI advice: avoid piping curl into bash; download & verify instead. (protsenko.dev)

If you want, I can:

  • produce a ready-to-drop GitHub Actions job that safely installs a pinned k3d version (with checksum verification, retries, caching, and preflight Docker checks), or
  • review your existing CI workflow and show exact lines to replace to make it safer. Which would you prefer?

Citations:


🌐 Web query:

GitHub Actions job timeout minutes kubernetes k3d setup typical values

💡 Result:

  • GitHub Actions default job timeout: 360 minutes (6 hours). [1][2]

  • Typical timeout-minutes to use for k3d-based CI:

    • Fast/local dev runs (create small k3d cluster + smoke tests): 15–30 minutes. [3]
    • Normal CI (build images, start k3d cluster, run integration tests): 60–120 minutes (common choice). [3][4]
    • Large E2E or flaky-network runs (multi-node clusters, long test suites): 180–360 minutes (or keep default 360). [2][4]
  • Notes / tips:

    • Tune per-job based on measured run times; prefer shorter timeouts to fail fast. [2]
    • k3d itself has its own cluster startup timeout settings (seconds) you may also need to increase when creating multi-node clusters. [3]

Sources: GitHub Actions timeout docs/cheatsheet and guides; k3d examples; community recommendations. [1][2][3][4]


Add job timeout and improve k3d install error handling.

The workflow lacks explicit timeout protection for long-running operations. The k3d install step uses only set -x without set -e, so if curl fails, the job proceeds with a missing k3d binary, causing delayed failure detection. Additionally, curl -s silently masks HTTP errors.

Add a job timeout and fix error handling in the k3d install step. Based on typical k3d multi-server cluster creation (5–6 minutes with image pulls), test execution, and operator startup, a 30-minute timeout is reasonable but should be tuned to your actual measured runtime.

Apply this diff:

  test-e2e:
    name: Run end-to-end tests
    runs-on: ubuntu-latest
+   timeout-minutes: 30
    strategy:
      matrix:
        k3s-version: [ v1.27.1 ]
    steps:
      - name: Install K3D
        run: |
-         set -x
-         curl -s https://raw.githubusercontent.com/rancher/k3d/main/install.sh | bash
+         set -ex
+         curl -fsSL https://raw.githubusercontent.com/rancher/k3d/main/install.sh | bash
          sudo mkdir -p $HOME/.kube && sudo chown -R runner $HOME/.kube
          k3d cluster create --servers 3 --image rancher/k3s:${{ matrix.k3s-version }}-k3s1
          kubectl version
          k3d version

The -f flag on curl makes it fail on HTTP errors (4xx/5xx), and -L follows redirects. For multi-server clusters, also consider adding --timeout 300 to the k3d cluster create command to prevent indefinite hangs if networking issues occur.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
test-e2e:
name: Run end-to-end tests
runs-on: ubuntu-latest
strategy:
matrix:
k3s-version: [ v1.27.1 ]
# k3s-version: [v1.20.2, v1.19.2, v1.18.9, v1.17.11, v1.16.15]
steps:
- name: Install K3D
run: |
set -x
curl -s https://raw.githubusercontent.com/rancher/k3d/main/install.sh | bash
sudo mkdir -p $HOME/.kube && sudo chown -R runner $HOME/.kube
k3d cluster create --servers 3 --image rancher/k3s:${{ matrix.k3s-version }}-k3s1
kubectl version
k3d version
test-e2e:
name: Run end-to-end tests
runs-on: ubuntu-latest
timeout-minutes: 30
strategy:
matrix:
k3s-version: [ v1.27.1 ]
# k3s-version: [v1.20.2, v1.19.2, v1.18.9, v1.17.11, v1.16.15]
steps:
- name: Install K3D
run: |
set -ex
curl -fsSL https://raw.githubusercontent.com/rancher/k3d/main/install.sh | bash
sudo mkdir -p $HOME/.kube && sudo chown -R runner $HOME/.kube
k3d cluster create --servers 3 --image rancher/k3s:${{ matrix.k3s-version }}-k3s1
kubectl version
k3d version
🤖 Prompt for AI Agents
.github/workflows/ci-tests.yaml around lines 159 to 174: the test-e2e job needs
a timeout and more robust k3d install error handling; add a job-level timeout
(e.g., timeout-minutes: 30) to the job definition, change the shell set to fail
on errors (use set -e -x) in the Install K3D step, make curl fail on HTTP errors
and follow redirects by using -fL instead of -s, and add a sensible timeout to
the cluster creation command (e.g., --timeout 300) so k3d cluster create cannot
hang indefinitely.

- name: Checkout code
uses: actions/checkout@v5
- name: Setup Golang
uses: actions/setup-go@v6
with:
go-version-file: 'test/ginkgo/go.mod'
- name: GH actions workaround - Kill XSP4 process
run: |
sudo pkill mono || true
- name: Restore go build cache
uses: actions/cache@v4
with:
path: ~/.cache/go-build
key: ${{ runner.os }}-go-build-v1-${{ github.run_id }}
- name: Add /usr/local/bin to PATH
run: |
echo "/usr/local/bin" >> $GITHUB_PATH
- name: Download Go dependencies
run: |
cd test/ginkgo && go mod download
- name: Build local image updater, deploy operator
env:
ARGOCD_CLUSTER_CONFIG_NAMESPACES: argocd-e2e-cluster-config
K3D_CLUSTER_NAME: k3s-default
run: |
set -o pipefail
make -C test/ginkgo test-e2e-ci

- name: Verify operator deployment
run: |
echo "--- Checking operator deployment ---"
kubectl get deployment -n argocd-operator-system
echo "--- Checking operator pods ---"
kubectl get pods -n argocd-operator-system
echo "--- Checking operator logs (last 50 lines) ---"
OPERATOR_POD=$(kubectl get po -n argocd-operator-system -o=name | grep controller-manager | head -1)
kubectl logs -n argocd-operator-system "$OPERATOR_POD" -c manager --tail=50 || true

- name: Run ginkgo tests
run: |
set -o pipefail
make e2e-tests-parallel-ginkgo 2>&1 | tee /tmp/e2e-test-ginkgo.log

- name: Save application controller and server logs
if: ${{ failure() }}
run: |
# Collect logs from test namespaces. The ginkgo tests use dynamically generated
# namespace names with the prefix 'gitops-e2e-test-'.
set -x

# Find all gitops-e2e-test-* namespaces
E2E_NAMESPACES=$(kubectl get namespaces -o=name | grep 'gitops-e2e-test-' | sed 's|namespace/||' || true)

if [ -n "$E2E_NAMESPACES" ]; then
for NS in $E2E_NAMESPACES; do
echo "--- Collecting resources from namespace $NS ---"
kubectl get all -n "$NS" >> /tmp/pods.log 2>&1 || true

# Collect application controller logs (use head -1 to get only first matching pod)
APP_CONTROLLER=$(kubectl get po -n "$NS" -o=name 2>/dev/null | grep argocd-application-controller | head -1 || true)
if [ -n "$APP_CONTROLLER" ]; then
kubectl logs -n "$NS" "$APP_CONTROLLER" >> /tmp/e2e-application-controller.log 2>&1 || true
fi

# Collect server logs (use head -1 to get only first matching pod)
SERVER_POD=$(kubectl get po -n "$NS" -o=name 2>/dev/null | grep argocd-server | head -1 || true)
if [ -n "$SERVER_POD" ]; then
kubectl logs -n "$NS" "$SERVER_POD" >> /tmp/e2e-server.log 2>&1 || true
kubectl describe -n "$NS" "$SERVER_POD" >> /tmp/e2e-server.log 2>&1 || true
fi

# Collect image updater logs (use head -1 to get only first matching pod)
IMAGE_UPDATER_POD=$(kubectl get po -n "$NS" -o=name 2>/dev/null | grep 'image-updater' | head -1 || true)
if [ -n "$IMAGE_UPDATER_POD" ]; then
kubectl logs -n "$NS" "$IMAGE_UPDATER_POD" >> /tmp/e2e-image-updater.log 2>&1 || true
fi
done
fi

# Collect operator logs from argocd-operator-system namespace
echo "--- Operator pods status ---" > /tmp/e2e-operator-run.log
kubectl get pods -n argocd-operator-system -o wide >> /tmp/e2e-operator-run.log 2>&1 || true
echo "" >> /tmp/e2e-operator-run.log

# Collect logs from all operator pods (there might be multiple due to restarts)
echo "--- Operator pod logs ---" >> /tmp/e2e-operator-run.log
for pod in $(kubectl get po -n argocd-operator-system -o=name 2>/dev/null | grep controller-manager || true); do
echo "" >> /tmp/e2e-operator-run.log
echo "=== Logs from $pod ===" >> /tmp/e2e-operator-run.log
kubectl logs -n argocd-operator-system "$pod" -c manager --tail=500 >> /tmp/e2e-operator-run.log 2>&1 || true
# Also try previous container logs if pod restarted
kubectl logs -n argocd-operator-system "$pod" -c manager --previous --tail=200 >> /tmp/e2e-operator-run.log 2>&1 || true
done

echo "" >> /tmp/e2e-operator-run.log
echo "--- Operator deployment status ---" >> /tmp/e2e-operator-run.log
kubectl get deployment -n argocd-operator-system -o yaml >> /tmp/e2e-operator-run.log 2>&1 || true

- name: Upload operator logs
uses: actions/upload-artifact@v5
with:
name: e2e-operator-run-${{ matrix.k3s-version }}.log
path: /tmp/e2e-operator-run.log
if: ${{ failure() }}

- name: Upload ginkgo test logs
uses: actions/upload-artifact@v5
with:
name: e2e-test-${{ matrix.k3s-version }}.log
path: /tmp/e2e-test-ginkgo.log
if: ${{ failure() }}

- name: Upload application controller logs
uses: actions/upload-artifact@v5
with:
name: e2e-application-controller-${{ matrix.k3s-version }}.log
path: /tmp/e2e-application-controller.log
if: ${{ failure() }}

- name: Upload server logs
uses: actions/upload-artifact@v5
with:
name: e2e-server-${{ matrix.k3s-version }}.log
path: /tmp/e2e-server.log
if: ${{ failure() }}

- name: Upload image updater logs
uses: actions/upload-artifact@v5
with:
name: e2e-image-updater-${{ matrix.k3s-version }}.log
path: /tmp/e2e-image-updater.log
if: ${{ failure() }}

- name: Upload pod descriptions
uses: actions/upload-artifact@v5
with:
name: e2e-pods-${{ matrix.k3s-version }}.log
path: /tmp/pods.log
if: ${{ failure() }}
16 changes: 16 additions & 0 deletions Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -243,6 +243,22 @@ golangci-lint: $(GOLANGCI_LINT) ## Download golangci-lint locally if necessary.
$(GOLANGCI_LINT): $(LOCALBIN)
$(call go-install-tool,$(GOLANGCI_LINT),github.com/golangci/golangci-lint/v2/cmd/golangci-lint,$(GOLANGCI_LINT_VERSION))

## E2E
.PHONY: e2e-tests-sequential-ginkgo
e2e-tests-sequential-ginkgo: ginkgo
@echo "Running operator sequential Ginkgo E2E tests..."
$(GINKGO_CLI) -v --trace --timeout 90m -r ./test/ginkgo/sequential

.PHONY: e2e-tests-parallel-ginkgo
e2e-tests-parallel-ginkgo: ginkgo
@echo "Running operator parallel Ginkgo E2E tests..."
$(GINKGO_CLI) -p -v -procs=1 --trace --timeout 90m -r ./test/ginkgo/parallel

GINKGO_CLI = $(shell pwd)/bin/ginkgo
.PHONY: ginkgo
ginkgo: ## Download ginkgo locally if necessary.
$(call go-install-tool,$(GINKGO_CLI),github.com/onsi/ginkgo/v2/ginkgo,v2.27.2)

# go-install-tool will 'go install' any package with custom target and name of binary, if it doesn't exist
# $1 - target path with name of binary
# $2 - package url which can be installed
Expand Down
112 changes: 112 additions & 0 deletions test/ginkgo/Makefile
Original file line number Diff line number Diff line change
@@ -0,0 +1,112 @@
# Default image for argocd-operator. Can be overridden.
ARGOCD_OPERATOR_IMAGE ?= quay.io/argoprojlabs/argocd-operator:latest

# Get version from root VERSION file
IMAGE_NAMESPACE?=quay.io/argoprojlabs
IMAGE_NAME=argocd-image-updater
ifdef IMAGE_NAMESPACE
IMAGE_PREFIX=${IMAGE_NAMESPACE}/
else
IMAGE_PREFIX=
endif

VERSION := $(shell cat ../../VERSION)
IMAGE_TAG?=v${VERSION}
# Image URL to use all building/pushing image targets
ARGOCD_IMAGE_UPDATER_IMAGE ?= ${IMAGE_PREFIX}${IMAGE_NAME}:${IMAGE_TAG}

# Define the patch template
define PATCH_TEMPLATE
apiVersion: apps/v1
kind: Deployment
metadata:
name: controller-manager
namespace: system
spec:
template:
spec:
containers:
- name: manager
env:
- name: ARGOCD_IMAGE_UPDATER_IMAGE
value: $(ARGOCD_IMAGE_UPDATER_IMAGE)
endef
export PATCH_TEMPLATE

# Tools - assuming they are in the path or in the project's bin directory
KUSTOMIZE ?= $(CURDIR)/../../bin/kustomize
KUBECTL ?= kubectl
K3D ?= k3d

K3D_CLUSTER_NAME ?= test-e2e-local

ifndef ignore-not-found
ignore-not-found = false
endif

.PHONY: deploy-argocd-operator undeploy-argocd-operator kustomize test-e2e-local test-e2e-ci k3d-cluster-create k3d-cluster-delete k3d-image-import
kustomize:
$(MAKE) -C ../../ kustomize

deploy-argocd-operator: kustomize ## Deploy argocd-operator from a stable git reference.
@echo "Deploying Argo CD Operator..."
@set -e; \
TMP_DIR=$$(mktemp -d); \
cp prereqs/kustomization.yaml $$TMP_DIR/kustomization.yaml; \
echo "Applying argocd-operator manifests with image $(ARGOCD_OPERATOR_IMAGE)..."; \
echo "Setting argocd-image-updater image to $(ARGOCD_IMAGE_UPDATER_IMAGE)..."; \
echo "$$PATCH_TEMPLATE" > $$TMP_DIR/patch.yaml; \
cd $$TMP_DIR && \
$(KUSTOMIZE) edit set image quay.io/argoprojlabs/argocd-operator=$(ARGOCD_OPERATOR_IMAGE) && \
$(KUSTOMIZE) edit add patch --path patch.yaml; \
$(KUSTOMIZE) build $$TMP_DIR | $(KUBECTL) apply --server-side=true -f -; \
rm -rf $$TMP_DIR; \
echo "Argo CD Operator deployment initiated.";

undeploy-argocd-operator: kustomize ## Undeploy argocd-operator.
@echo "Undeploying Argo CD Operator..."
@set -e; \
TMP_DIR=$$(mktemp -d); \
cp prereqs/kustomization.yaml $$TMP_DIR/kustomization.yaml; \
$(KUSTOMIZE) build $$TMP_DIR | $(KUBECTL) delete --ignore-not-found=$(ignore-not-found) -f -; \
rm -rf $$TMP_DIR; \
echo "Argo CD Operator undeployment initiated.";

k3d-cluster-create: ## Create k3d cluster for e2e tests
@echo "--- Creating k3d cluster $(K3D_CLUSTER_NAME) ---"
$(K3D) cluster create $(K3D_CLUSTER_NAME)

k3d-cluster-delete: ## Delete k3d cluster for e2e tests
@echo "--- Deleting k3d cluster $(K3D_CLUSTER_NAME) ---"
$(K3D) cluster delete $(K3D_CLUSTER_NAME)

k3d-image-import: ## Import local image to k3d cluster
@echo "--- Importing image $(ARGOCD_IMAGE_UPDATER_IMAGE) to k3d cluster $(K3D_CLUSTER_NAME) ---"
$(K3D) image import $(ARGOCD_IMAGE_UPDATER_IMAGE) -c $(K3D_CLUSTER_NAME)

# Currently run only parallel tests because we don't have sequential tests yet.
test-e2e-local: ## Build local image updater, deploy operator, and run parallel e2e tests.
@echo "--- Creating k3d cluster---"
$(MAKE) k3d-cluster-create
@echo "--- Building local argocd-image-updater image ---"
$(MAKE) -C ../../ docker-build
@echo "--- Importing image to k3d cluster ---"
$(MAKE) k3d-image-import
@echo "--- Deploying argocd-operator with local image-updater ---"
$(MAKE) deploy-argocd-operator
@echo "--- Waiting for argocd-operator to be ready ---"
$(KUBECTL) wait --for=condition=available --timeout=300s deployment/argocd-operator-controller-manager -n argocd-operator-system
@echo "--- Running Parallel E2E tests ---"
$(MAKE) -C ../../ e2e-tests-parallel-ginkgo
@echo "--- Deleting k3d cluster ---"
$(MAKE) k3d-cluster-delete

test-e2e-ci: ## Build local image updater, deploy operator.
@echo "--- Building local argocd-image-updater image ---"
$(MAKE) -C ../../ docker-build
@echo "--- Importing image to k3d cluster ---"
$(MAKE) k3d-image-import
@echo "--- Deploying argocd-operator with local image-updater ---"
$(MAKE) deploy-argocd-operator
@echo "--- Waiting for argocd-operator to be ready ---"
$(KUBECTL) wait --for=condition=available --timeout=300s deployment/argocd-operator-controller-manager -n argocd-operator-system
Loading
Loading