Skip to content

Hardcoded energy1_input in gpu_device_stub.cpp make gpu device un-discoverable #123

@Stonepia

Description

@Stonepia

When I tried to install an agama build driver, I found that the clinfo -l could get the gpu device but xpu-smi discovery can't find it.

The root cause is that the hwmon file changed.

# Only exists:
/sys/class/hwmon/hwmon3/energy2_input

# Does NOT exist (xpu-smi looks for this):
/sys/class/hwmon/hwmon3/energy1_input

So the return value is empty. I did the change like this and the issue is fixed. Not very sure this is the right fix, but put here as a reference:

cd ~/xpumanager && git diff V1.3.6 -- core/src/device/gpu/gpu_device_stub.cpp
diff --git a/core/src/device/gpu/gpu_device_stub.cpp b/core/src/device/gpu/gpu_device_stub.cpp
index 258ebab..f5e5334 100644
--- a/core/src/device/gpu/gpu_device_stub.cpp
+++ b/core/src/device/gpu/gpu_device_stub.cpp
@@ -224,8 +224,14 @@ std::shared_ptr<MeasurementData> GPUDeviceStub::loadPVCIdlePowers(std::string bd
                     std::string name = getFileValue("/sys/class/hwmon/" + std::string(pdirent->d_name) +"/name");
                     name.erase(0, name.find_first_not_of(" \n\r\t"));                                                                                               
                     name.erase(name.find_last_not_of(" \n\r\t") + 1);
-                    auto energy_path = "/sys/class/hwmon/" + std::string(pdirent->d_name) +"/energy1_input";
-                    uint64_t value = std::stoull(getFileValue(energy_path));
+                    // xe driver (kernel >= 6.8) uses energy2_input; i915 uses energy1_input
+                    std::string energy_path = "/sys/class/hwmon/" + std::string(pdirent->d_name) + "/energy1_input";
+                    if (access(energy_path.c_str(), F_OK) != 0)
+                        energy_path = "/sys/class/hwmon/" + std::string(pdirent->d_name) + "/energy2_input";
+                    std::string energy_str = getFileValue(energy_path);
+                    if (energy_str.empty())
+                        continue;
+                    uint64_t value = std::stoull(energy_str);
                     auto timestamp = Utility::getCurrentMillisecond();
                     XPUM_LOG_TRACE("[{}] path:{}, value: {}, timestamp: {}", gpu_bdf, energy_path, value, timestamp);
                     if (pvc_idle_powers.count(gpu_bdf) == 0)

Environment

Hardware	Intel Data Center GPU Max 1550 (8086:0BD5)
OS	Ubuntu 24.04
Kernel	Linux 984fee015d7d.jf.intel.com 6.18.0-rc2+prerelease3000+ #1 SMP PREEMPT_DYNAMIC Sun Oct 26 04:57:21 PDT 2025 x86_64 x86_64 x86_64 GNU/Linux
Level Zero	1.28.0.0
xpu-smi version	1.3.6

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions