Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Eventrouter - memory leak, fails silently after 3GB #1993

Open
jeremych1000 opened this issue Mar 26, 2025 · 0 comments
Open

Eventrouter - memory leak, fails silently after 3GB #1993

jeremych1000 opened this issue Mar 26, 2025 · 0 comments
Labels
bug Something isn't working

Comments

@jeremych1000
Copy link

Bugs should be filed for issues encountered whilst operating logging-operator.
You should first attempt to resolve your issues through the community support
channels, e.g. Slack, in order to rule out individual configuration errors. #logging-operator
Please provide as much detail as possible.

Describe the bug:
We use eventtailer with logging-operator to log Kubernetes events. The image used is 0.4.0 from https://github.com/kube-logging/eventrouter. I see it's a fork and that #1966 recognises that this isn't great.

We've seen eventrouter linearly consume more and more memory as time goes on, and it failed silently after 3GB of memory consumed. Restarting it fixed the problem.

Expected behaviour:
Memory usage of event tailer to be more or less constant.

Steps to reproduce the bug:
Monitor event tailer memory usage across a few days. Here's a screenshot.

The dropoff is caused by us merging a change which added requets/limits at 4pm on 19/3. You can still see the memory leak as the graph continues to go up before getting oomkilled.

Image

Additional context:
We are running it with vanilla config. Manifest posted below.

Environment details:

  • Kubernetes version (e.g. v1.15.2): v1.30.6
  • Cloud-provider/provisioner (e.g. AKS, GKE, EKS, PKE etc): on prem
  • logging-operator version (e.g. 2.1.1): 4.10.0, which is not the newest, but the eventrouter image is still 0.4.0
  • Install method (e.g. helm or static manifests): argocd
  • Logs from the misbehaving component (and any other relevant logs): no relevant logs
  • Resource definition (possibly in YAML format) that caused the issue, without sensitive data:
apiVersion: logging-extensions.banzaicloud.io/v1alpha1
kind: EventTailer
metadata: <removed for brevity>
spec:
  containerOverrides:
    resources:
      limits:
        cpu: 50m
        memory: 250Mi
      requests:
        cpu: 10m
        memory: 100Mi
  controlNamespace: <redacted>
  image:
    imagePullSecrets: []
    pullPolicy: IfNotPresent
    repository:  <redacted - we use a proxy>
    tag: 0.4.0
  positionVolume:
    pvc:
      spec:
        accessModes:
          - ReadWriteOnce
        resources:
          requests:
            storage: 1Gi
        volumeMode: Filesystem
  workloadMetaOverrides:
    labels:
      logging-operator/component: eventTailer

/kind bug

@jeremych1000 jeremych1000 added the bug Something isn't working label Mar 26, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant