Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Upgrades of the OpenTelmetry Operator requires restarting of all pods that use a sidecar and auto-instrumentation. #3601

Open
phillipsj opened this issue Jan 10, 2025 · 1 comment
Labels
bug Something isn't working

Comments

@phillipsj
Copy link

Component(s)

collector, auto-instrumentation

What happened?

Description

I have noticed after doing a few upgrades that I need to cycle pods to get the updated sidecar and auto-instrumentation as it seems to break and throw errors on the webhook in the manager. Is there a recommended process to prevent this?

Steps to Reproduce

  1. Install version 0.115.0 of the operator.
  2. Deploy a pod with autoinstrumentation and a sidecar collector
  3. Upgrade to version 0.116.x

Expected Result

The autoinstrumentation upgrades and so does the sidecar collector without requiring the pods to be deleted. Or at least not throw errors.

Actual Result

Errors occur and the auto instrumentation and sidecar collector do not get redeployed.

Kubernetes Version

1.29

Operator version

0.116.0

Collector version

0.116.1

Environment information

Environment

OS: (e.g., "Ubuntu 20.04")
Compiler(if manually compiled): (e.g., "go 14.2")

Log output

{"level":"ERROR","timestamp":"2025-01-09T15:52:45.043638118Z","logger":"instrumentation-upgrade","message":"failed to apply changes to instance","name":"java-sidecar-instrumentation","namespace":"opentelemetry-operator-system","error":"Internal error occurred: failed calling webhook \"minstrumentation.kb.io\": failed to call webhook: Post \"https://opentelemetry-operator-webhook-service.opentelemetry-operator-system.svc:443/mutate-opentelemetry-io-v1alpha1-instrumentation?timeout=10s\": no endpoints available for service \"opentelemetry-operator-webhook-service\"","stacktrace":"github.com/open-telemetry/opentelemetry-operator/pkg/instrumentation/upgrade.(*InstrumentationUpgrade).ManagedInstances\n\t/home/runner/work/opentelemetry-operator/opentelemetry-operator/pkg/instrumentation/upgrade/upgrade.go:93\nmain.addDependencies.func2\n\t/home/runner/work/opentelemetry-operator/opentelemetry-operator/main.go:548\nsigs.k8s.io/controller-runtime/pkg/manager.RunnableFunc.Start\n\t/home/runner/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/manager/manager.go:307\nsigs.k8s.io/controller-runtime/pkg/manager.(*runnableGroup).reconcile.func1\n\t/home/runner/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/manager/runnable_group.go:226"}
{"level":"ERROR","timestamp":"2025-01-09T15:52:45.046831977Z","logger":"collector-upgrade","message":"failed to apply changes to instance","name":"pod-sidecar","namespace":"opentelemetry-operator-system","error":"Internal error occurred: failed calling webhook \"mopentelemetrycollectorbeta.kb.io\": failed to call webhook: Post \"https://opentelemetry-operator-webhook-service.opentelemetry-operator-system.svc:443/mutate-opentelemetry-io-v1beta1-opentelemetrycollector?timeout=10s\": no endpoints available for service \"opentelemetry-operator-webhook-service\"","stacktrace":"github.com/open-telemetry/opentelemetry-operator/pkg/collector/upgrade.VersionUpgrade.ManagedInstances\n\t/home/runner/work/opentelemetry-operator/opentelemetry-operator/pkg/collector/upgrade/upgrade.go:86\nmain.addDependencies.func1\n\t/home/runner/work/opentelemetry-operator/opentelemetry-operator/main.go:534\nsigs.k8s.io/controller-runtime/pkg/manager.RunnableFunc.Start\n\t/home/runner/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/manager/manager.go:307\nsigs.k8s.io/controller-runtime/pkg/manager.(*runnableGroup).reconcile.func1\n\t/home/runner/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/manager/runnable_group.go:226"}
{"level":"ERROR","timestamp":"2025-01-09T15:52:45.049590845Z","logger":"instrumentation-upgrade","message":"failed to apply changes to instance","name":"nodejs-sidecar-instrumentation","namespace":"opentelemetry-operator-system","error":"Internal error occurred: failed calling webhook \"minstrumentation.kb.io\": failed to call webhook: Post \"https://opentelemetry-operator-webhook-service.opentelemetry-operator-system.svc:443/mutate-opentelemetry-io-v1alpha1-instrumentation?timeout=10s\": no endpoints available for service \"opentelemetry-operator-webhook-service\"","stacktrace":"github.com/open-telemetry/opentelemetry-operator/pkg/instrumentation/upgrade.(*InstrumentationUpgrade).ManagedInstances\n\t/home/runner/work/opentelemetry-operator/opentelemetry-operator/pkg/instrumentation/upgrade/upgrade.go:93\nmain.addDependencies.func2\n\t/home/runner/work/opentelemetry-operator/opentelemetry-operator/main.go:548\nsigs.k8s.io/controller-runtime/pkg/manager.RunnableFunc.Start\n\t/home/runner/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/manager/manager.go:307\nsigs.k8s.io/controller-runtime/pkg/manager.(*runnableGroup).reconcile.func1\n\t/home/runner/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/manager/runnable_group.go:226"}
{"level":"ERROR","timestamp":"2025-01-09T15:52:45.055291592Z","logger":"instrumentation-upgrade","message":"failed to apply changes to instance","name":"dotnet-sidecar-instrumentation","namespace":"opentelemetry-operator-system","error":"Internal error occurred: failed calling webhook \"minstrumentation.kb.io\": failed to call webhook: Post \"https://opentelemetry-operator-webhook-service.opentelemetry-operator-system.svc:443/mutate-opentelemetry-io-v1alpha1-instrumentation?timeout=10s\": no endpoints available for service \"opentelemetry-operator-webhook-service\"","stacktrace":"github.com/open-telemetry/opentelemetry-operator/pkg/instrumentation/upgrade.(*InstrumentationUpgrade).ManagedInstances\n\t/home/runner/work/opentelemetry-operator/opentelemetry-operator/pkg/instrumentation/upgrade/upgrade.go:93\nmain.addDependencies.func2\n\t/home/runner/work/opentelemetry-operator/opentelemetry-operator/main.go:548\nsigs.k8s.io/controller-runtime/pkg/manager.RunnableFunc.Start\n\t/home/runner/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/manager/manager.go:307\nsigs.k8s.io/controller-runtime/pkg/manager.(*runnableGroup).reconcile.func1\n\t/home/runner/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/manager/runnable_group.go:226"}

Additional context

No response

@phillipsj phillipsj added bug Something isn't working needs triage labels Jan 10, 2025
@swiatekm
Copy link
Contributor

I believe the errors are a result of #3468 and #3515, which is a bug we'd like to fix.

We can't upgrade sidecars or autoinstrumentation without recreating Pods, as they're immutable and can only be modified during creation. These are your own workloads, and it's up to you to decide when and how they can be recreated to facilitate the upgrade.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants