Skip to content

Conversation

@janLo
Copy link

@janLo janLo commented Dec 16, 2020

This is mainly a RFC as I'm aware that it changes the current behavior quite a bit - and I'm fine to keep it as a fork. Still I'm curious what others think of these changes:

  • I've removed the PodFailed branch from the pod observation as in my case, when I hit this branch, the reason and message was always empty. So I got a lot of events "Pod/xyz: " without any further helpful metadata. If there is a pod-change I always search for terminated containers.
  • Every failed container causes a sentry event now and not only the first one. They will be grouped by a common fingerprint though.
  • I've used the generateName field instead of uuid for the fingerprint (if the object has no owner) as this is the stable for a lot of solutions working with some kind of pod template (in my case Jenkins and Gitlab-Runner). If the generateName is not provided it uses the name in a mangled form like the sentry-kubernetes implementation did to allow better grouping.
  • I took the message out of the fingerprints as its often containing very specific information like the container ids preventing any grouping.

The podRailed case seemed pretty useless to me as the message and reason
was always empty so I got a lot of meaningless events. I also think that
any Pod-Failure that is not due to failed containers will show up as
events and get better reported as such.

Another disadvantage of the former implementation was, that it reported
only ever one container failure per pod.
@janLo janLo changed the title Refactor pod observation Modification of the pod observation Dec 16, 2020
janLo and others added 11 commits December 17, 2020 10:30
This introduces a name mangleing like the sentry-kubernets
implementation does for more aggressive issue grouping.
The message often contains unique information like container-ids or node
names causing basically no grouping to happen.
…VENT_LEVELS...

- Leaving SKIP_EVENT_LEVELS unset
  Defaults to old behavior, "normal" type events will be skipped

- Setting SKIP_EVENT_LEVELS to empty
  No event type will be skipped

- Setting SKIP_EVENT_LEVELS to a comma separated event type list like "SKIP_EVENT_LEVELS=normal,warning"
  All events of those types will be skipped. The filtering mechanism is case-insensitive, so "normal,warning" is the same as "nOrMal,WARning".
Fix Pod Term Detection and Make SkipLevels Configurable
feat: favor a namespace's skipLevel annotation for the global skipLevel env
added:
- NS annotation secunet.sentry/skip-event-reasons:
  allow configuration of skip event reasons per namespace

- env SKIP_EVENT_REASONS:
  global skip by reason config if NS has no skip config annotation

- Pod annotation secunet.sentry/ignore-pod-updates=true:
  allows for suppression of pod update event handling through the forwarder

changed:
- skip config declaration format now supports specifying a resource type before a criteria:
  [involved object type:]criteria[,...]
  E.g. normal,Pod:warning,Service:error
  See: parseSkipConfig(...) at skip.go:71
feat: skip by reason and skip pod updates...
@salzig
Copy link

salzig commented Jul 28, 2021

@wichert what's your opionion on this?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants