Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Kepler energy metrics are zeroes #46

Open
Lai-Kenny opened this issue Aug 9, 2023 · 3 comments
Open

Kepler energy metrics are zeroes #46

Lai-Kenny opened this issue Aug 9, 2023 · 3 comments

Comments

@Lai-Kenny
Copy link

Lai-Kenny commented Aug 9, 2023

Hello, everyone.
My Environment:

Ubuntu Linux Kernel Kubernetes
20.04.6 LTS 5.4.0-148-generic x86_64 v1.16.0

I install the Kepler by the kepler-helm-chart. I try some version of kepler chart version(0.4.2 and 0.4.1)
However, I found some metrics(e.g. kepler_container_cgroupfs_cpu_usage_us_total, kepler_container_core_joules_total and so on) is alway 0. It would affect estimate energy consumed, right??
And I try to follow the Trouble Shooting of Kepler to debug it. But I check my version of cGroup is v2.

So, how should I debug it?

@yellowhat
Copy link
Contributor

Hi,
does not seem like a helm chart issue.
@rootfs any suggestion?

Just a comment, from here seems that cgroup v2 are after ubuntu 21.10.

@Lai-Kenny
Copy link
Author

Lai-Kenny commented Aug 10, 2023

Hi, @yellowhat @rootfs
I have updated my Environment
kepler chart version: 0.5.0
Ubuntu: 22.04.3 LTS
Linux Kernel: 5.15.0-78-generic x86_64
kubelet: 1.21.0
cgroup: v2
But my some metrics(e.g. kepler_container_cgroupfs_cpu_usage_us_total, kepler_container_core_joules_total and so on) is alway 0. The following is my kepler log

I0810 13:36:11.444138       1 gpu.go:46] Failed to init nvml, err: could not init nvml: error opening libnvidia-ml.so.1: libnvidia-ml.so.1: cannot open shared object file: No such file or directory
I0810 13:36:11.456218       1 exporter.go:155] Kepler running on version: a7a6cb1
I0810 13:36:11.456299       1 config.go:258] using gCgroup ID in the BPF program: true
I0810 13:36:11.456394       1 config.go:260] kernel version: 5.15
I0810 13:36:11.456441       1 exporter.go:179] EnabledBPFBatchDelete: true
I0810 13:36:11.456485       1 rapl_msr_util.go:129] failed to open path /dev/cpu/0/msr: no such file or directory
I0810 13:36:11.456556       1 power.go:64] Not able to obtain power, use estimate method
I0810 13:36:11.456585       1 redfish.go:169] failed to get redfish credential file path
I0810 13:36:11.456601       1 power.go:55] use acpi to obtain power
I0810 13:36:11.456790       1 acpi.go:67] Could not find any ACPI power meter path. Is it a VM?
I0810 13:36:11.467127       1 exporter.go:198] Initializing the GPU collector
I0810 13:36:17.469943       1 watcher.go:66] Using in cluster k8s config
I0810 13:36:17.571852       1 bpf_perf.go:123] LibbpfBuilt: false, BccBuilt: true
cannot attach kprobe, probe entry may not exist
I0810 13:36:18.414147       1 bcc_attacher.go:186] Successfully load eBPF module from bcc with option: [-DMAP_SIZE=10240 -DNUM_CPUS=4 -DSET_GROUP_ID]
I0810 13:36:18.440766       1 exporter.go:251] Started Kepler in 6.984594733s

@rootfs
Copy link
Contributor

rootfs commented Aug 14, 2023

@Lai-Kenny can you check if cpu accounting is turned on? please follow the threads here

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants