Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Split k8s.rules group #882

Merged

Conversation

skl
Copy link
Collaborator

@skl skl commented Oct 27, 2023

In order to prevent missed deadlines of some rules affecting the entire group, split the k8s.rules group into 7 groups. Similar to #632.

Pre-split, the single k8s.rules group was failing evaluation constantly at 1 minute interaval, so there's isn't really a "before" state other than "broken".

At the first commit in this PR (45678b5), I tried 3 groups. 2/3 groups were fine but there was still 1 group failing. I then split that group into 5 on the second commit (d1a28fc), as every rule was taking ~20-30s.

Post-split evaluation sample from a busy cluster using the 7 groups:

  • k8s.rules.container_cpu_usage_seconds_total (total: 12s)
    • node_namespace_pod_container:container_cpu_usage_seconds_total:sum_irate 12s
  • k8s.rules.container_memory_cache (total: 27s)
    • node_namespace_pod_container:container_memory_cache 27s
  • k8s.rules.container_memory_rss (total: 17s)
    • node_namespace_pod_container:container_memory_rss 17s
  • k8s.rules.container_memory_swap (total: 27s)
    • node_namespace_pod_container:container_memory_swap 27s
  • k8s.rules.container_memory_working_set_bytes (total: 25s)
    • node_namespace_pod_container:container_memory_working_set_bytes 25s
  • k8s.rules.container_resource (total: 46s)
    • cluster:namespace:pod_memory:active:kube_pod_container_resource_requests 11s
    • namespace_memory:kube_pod_container_resource_requests:sum 4s
    • cluster:namespace:pod_cpu:active:kube_pod_container_resource_requests 10s
    • namespace_cpu:kube_pod_container_resource_requests:sum 2s
    • cluster:namespace:pod_memory:active:kube_pod_container_resource_limits 9s
    • namespace_memory:kube_pod_container_resource_limits:sum 2s
    • cluster:namespace:pod_cpu:active:kube_pod_container_resource_limits 6s
    • namespace_cpu:kube_pod_container_resource_limits:sum 2s
  • k8s.rules.pod_owner (total: 11s)
    • namespace_workload_pod:kube_pod_owner:relabel (deployment) 9s
    • namespace_workload_pod:kube_pod_owner:relabel (daemonset) 2s
    • namespace_workload_pod:kube_pod_owner:relabel (statefulset) 0s
    • namespace_workload_pod:kube_pod_owner:relabel (job) 0s

With the 7-group configuration, all rules are evaluating sucessfully.

In order to prevent missed deadlines of some rules affecting the entire group, split the k8s.rules group into three groups.

Signed-off-by: Stephen Lang <[email protected]>
@povilasv
Copy link
Contributor

This LGTM, do you have any benchmarks? like #632

@skl
Copy link
Collaborator Author

skl commented Oct 30, 2023

@povilasv currently testing and gathering some data on rule group performance, will report back with findings.

@skl skl changed the title Split k8s.rules group into three groups Split k8s.rules group Nov 3, 2023
@skl
Copy link
Collaborator Author

skl commented Nov 6, 2023

@povilasv I've updated the PR description with performance data. Please take a look when you have time.

@povilasv
Copy link
Contributor

povilasv commented Nov 8, 2023

Looks good to me. Thanks for the work!

@povilasv povilasv merged commit bcf8426 into kubernetes-monitoring:master Nov 8, 2023
6 checks passed
@skl skl deleted the upstream-k8s-apps-rules-split branch November 8, 2023 12:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants