Releases: sustainable-computing-io/kepler
release-0.7.12
908db28 fix(bpf_collector): fix command name in case of kernel processes
b90a63b fix(bpf): exclude bpf overhead in bpf_cpu_time
987a139 fix(metrics): Remove resource usage check for skipping bpf metrics
c20c23c fix: error initializing dcgm
4121376 fix(aa66ada): reading libraries to the builder
4c63b43 fix(validator): update the bpf cpu time query
097541c fix: gosec failures (#1778)
5b95acb fix: nvml/dcgm builds
9d14581 fix: habana image build
aa66ada fix(dockerfile): remove redundant habanalabs installation steps
34d27b8 fix: do not probe for power-meters when disabled
ecd5f54 feat(models): update acpi dyn to 0.7.11
7303454 feat(models): update acpi abspower to 0.7.11
fcc8e0a feat(models): update intel-rapl abspower to 0.7.11
c84bfd0 feat(models): log model source url
7b07762 feat: get trainer from model_name in weight
a7b9892 fix(models): add predictor name in errors
aa822ee feat: compute core ratio for local regressor (#1743)
5d6ecc0 fix(config): trim spaces and new lines in MODEL_CONFIG
b2926d1 fix: limit max core ratio to 1
28d42a4 feat: compute idle power with core ratio (#1732)
d8a6c14 feat: add machine spec generator/reader for model weight request
11ff51d feat: add --disable-power-meter option
414533c fix: set default trainer only for local regressor
d446231 fix: format ComponentModelWeights
a6f75a4 feat: add model_name attribute to ComponentModelWeights
6858c58 fix: watcher resubmit items to workqueue (#1686)
eb5a72a fix(bpf): Fix overhead when sampling (#1685)
a97030f fix(bpf): use prev_tgid to register process
5432a39 feat: customize vm_id with libvirt metadata
73367d3 fix: correct regex path name for VM
6a6017b feat: save validation result as json and show the static dashboard using js
c99e399 fix(pkg/bpf): Use channel to process events (#1671)
cde7833 fix: resolve pid 0 to system_processes
b424607 feat(kubernetes): Use workqueues
a39ae55 fix: typo in filename
d73094e fix: apply suggestions from code review
5a113b2 fix(bpf): tgid is in the upper 32 bits
New Contributors
- @arthurus-rex made their first contribution in #1621
Full Changelog: v0.7.11...v0.7.12
release-0.7.11
Changes
- 1506691 - feat(validator): trigger validator workflow on changes (#1591)
- c7b3ddb - fix(collector): convert cpu time in collection time instead of reporting time to avoid inconsistent use of cpu time in models
- 9c80387 - bpf: account all running state processes (#1546)
- 91fc8d4 - feat(validator): Add workflow for validator tests (#1570)
- a2289d2 - fix: fallback to reading cpus.yaml relative to current dir (#1572)
- bfaadae - pick up the go mod vendor changes
- fb7ef35 - feat(metrics): selectively expose prom metrics to reduce overhead
- d412bfb - fix: vendor/github.com/jaypipes/ghw/Dockerfile to reduce vulnerabilities (#1578)
- 365ac03 - bpf: remove tgid map
- c427a47 - fix(manifest): uncomment openshift SCC (#1575)
- ec2a775 - fix(validator): improve the validator config sample (#1569)
- 0e22839 - fix: update the VERSION variable assignment method (#1552)
- 96dd443 - fix: Fix uncomment of YAML in hack/build-manifests.sh
- 8931d61 - feat(validator): load validations from validations.yaml
- 4a7bc31 - fix(compose): enable bpf cgroup id
- a57041c - fix(bpf): Fix kepler_write_page_cache attach
- fbe9b3c - fix(bpf): Access __state from task_struct (#1550)
- 0b0b215 - fix(bpf): Use BTF-Defined Raw Tracepoints (#1542)
- a08a5f6 - deps: Fix usage of textparse.NewPromParser
- aad6964 - fix(bpf): Fix map lookup for IRQ/Page Cache
- 9114e75 - Fix MSE and MAPE Single Queries (#1522)
- 330a531 - fix(bpf): restore command label in process metrics
- edd4d04 - review feedback: fix mse queries
- d6420d5 - bump up local_dev_cluster_version version
- 4ced508 - bpf-collector: change log verbosity to easily show it in CI
- 34889bb - libbpf: update to use microseconds instead of milliseconds in the ebpf code because the low precision is identifying that the precess was not active
- 4337a5e - bpf: remove task time
- 0426e8f - feat(exporter): Graceful Shutdown
- aec3ab5 - report validator results
- 07636b1 - Replace expected and actual query with single query (#1489)
- 1759cca - feat(compose): add build arguments for Kepler image
- c678217 - use pmu name to get arm cpu id since archspec does not help here
- 8bb405f - stats: update the verbosity of annoying key error message due to missing gpu metrics (#1480)
- 59af568 - Add Test Cases for Prometheus, Config, Stresser for Validator (#1461)
- bdd44b1 - bpf: fix the process parameter order to match the c and go code (#1479)
- 747e7eb - fix: ensure all entries from bpf map is copied (#1477)
- c092204 - make: quote ldflags
- 468ed25 - add vm name option to validator (#1474)
- 244ae8b - feat: expose version label in kepler_build_info (#1473)
- 6ae21a0 - update validator usage; remove job from prom query
- 3ac4f6b - feat(cgroup): Add podman support (#1455)
- ada7884 - fix platform power return unit (#1468)
- 3c7e777 - fix(collector): Fix use of waitgroups
- b134a84 - fix(cmd/validator): Don't add when passing a wg
- 0158b0b - fix(dev-dashboard): update and correct metrics in dev dashboard
- f92532a - add new maintainers per 05/21 community meeting vote results (#1462)
- efad46f - provide a simple template for maintainer nominate (#1463)
- 9e957f3 - finish kepler on rhel tests
- dcf78e6 - fix: remove logging while collecting GPU metrics
- 49acca9 - fix(model): Use correct variable in IsNodeComponentPowerModelEnabled() (#1458)
- 1b93eb1 - Adding New Metric Cases to Case module (#1453)
- ea3e2f8 - add equinix metal instance to CI
- 53d06d4 - add PR review bot (#1446)
- 0baec47 - feat: Fixed eBPF Feature Detection (#1443)
- 5f59172 - fix(bpf): cleanup initialising structs and nested ifs (#1444)
- 2bca8dc - update hack/libbpf-headers.sh script to pull v1.3.0
New Contributors
- @wenboown made their first contribution in #1451
- @caniszczyk made their first contribution in #1578
Full Changelog: v0.7.10...v0.7.11
release-0.7.10
Summary
- fix(bpfassets): Fix object file lookup (#1419)
- feat(bpf): Build for bpfel and bpfeb
- feat(bpf): Bump up libbpf to 1.3.0
- fix(dashboard): show metal and VM metrics correctly (#1395)
- doc(dev): add section on how to profile (#1396)
- feat(bpf): Portable eBPF Probes
- feat(test): initial version of validator tool
- dev(compose): add manifests for validation
- fix(collector): Fix Segmentation fault when collecting CPU Freq from BPF (#1387)
- feat(kepler): enable pprof (#1383)
- fix habana installation
- fix previous pid of finish_task_switch (#1370)
- fix: update dashboard for docker-compose
- fix(build): reduce image size by squashing install and clean steps
- feat(compose): add docker-compose for easier local development
- feat(exporter): log listening port
- fix(build): reduce container image size (#1336)
New Contributors
- @dave-tucker made their first contribution in #1384
Full Changelog: v0.7.9...v0.7.10
release-0.7.8
bot: Updated coverage badge. Signed-off-by: sustainable-computing-bot <[email protected]>
release-0.7.8
bot: Updated coverage badge. Signed-off-by: sustainable-computing-bot <[email protected]>
release-0.7.7
revert rpm source (#1254) Signed-off-by: Huamin Chen <[email protected]>
release-0.7.6
fix rpm spec (#1253) Signed-off-by: Huamin Chen <[email protected]>
release-0.7.5
bot: Updated coverage badge. Signed-off-by: sustainable-computing-bot <[email protected]>
release-0.4
bot: Updated coverage badge. Signed-off-by: sustainable-computing-bot <[email protected]>
release-0.7.3
in kepler 0.7 release
- switch to libbpf as default ebpf provider
- base image update decouple GPU driver from kepler image itself
- use kprobe instead of tracepoint for ebpf to obtain context switch information
- add task clock event to ebpf and use it to calculate cpu usage for each process. The event is also exported to prometheus