Skip to content

zesDevicePciGetProperties() reports invalid PCI bandwidth (zeDevicePciGetPropertiesExt() is correct) #778

Open
@bgoglin

Description

@bgoglin

Hello
Why finishing the switch from ZES_ENABLE_SYSMAN=1 to zesInit() in hwloc, I have to remove some duplicate code that was used in the past when Sysman() could not be enabled. One of them is the query of the PCI properties of the device.
I was notified that PVC gets different PCI maxBandwidth on Aurora from zesDevicePciGetProperties() and zeDevicePciGetPropertiesExt()
(open-mpi/hwloc#595 (comment)). ZE reports 63GB/s as expected.
ZES reports 0.25GB/s instead. The reason could be that one reports the max possible value while the other reports the current (possibly idle) value, but the ZES doc says " The maximum bandwidth in bytes/sec (sum of all lanes)" anyway hence 0.25 doesn't make sense.
I don't have access to Aurora to debug further. I tested on other platforms but they seem to have older releases of the runtime (including on your endeavour cluster), and they just report -1 from ZES anyway (ZE is correct there too).

Metadata

Metadata

Assignees

No one assigned

    Labels

    L0 SysmanIssue related to L0 Sysman

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions