Skip to content

--hpx:bind syntax for multiple localities per node (required for Aurora experiments) #7265

@G-071

Description

@G-071

I am trying to run Octo-Tiger on Aurora (so this issue is related to the discussion #6713 ).

The recommended hardware bindings on Aurora are a bit tricky: We should run with 12 processes per compute node, each process using 8 cores and one GPU tile. The first core on either of the two sockets should not be used. Neither should the last three 3 cores on each socket be used (52 cores per socket).

Now, I got the current version of Octo-Tiger sort of working with HPX 1.11.0, Kokkos 4.7.04 (albeit that one needed some patching) and LCI 1.7.9 on Aurora. When I say sort of, I mean it can run in distributed settings just fine and the HPX-SYCL GPU support works. However, what does not work properly is the hardware binding as described above. When I run with 12 localities per compute node I often (nondeterministically) encounter situations where two or more HPX localities end up using the same CPU cores, leading to rather bad performance in these cases.

For more context: On Aurora we have to use PBS and I run Octo-Tiger via mpiexec with the cpu-bind parameter, for example like this:

mpiexec -n 12 -ppn 12 --cpu-bind=list:1-8:9-16:17-24:25-32:33-40:41-48:53-60:61-68:69-76:77-84:85-92:93-100 --mem-bind=list:2:2:2:2:2:2:3:3:3:3:3:3 map_processes_to_GPU_tiles.sh $OCTOTIGER_APP_PATH $OCTOTIGER_OPTIONS_GPU_KOKKOS --hpx:ignore-batch-env --hpx:threads=8 --hpx:use-process-mask  --hpx:print-bind --hpx:ini=hpx.parcel.lci.priority=1000 --hpx:ini=hpx.parcel.lci.enable=1 --hpx:nodefile=$PBS_NODEFILE

(I tried various permutations of that with/without the ignore-bath-env and use-process-mask and even with/without the cpu-bind parameter without any more success -- I still run into HPX localities sharing the same CPU cores).

There is probably a separate issue here somewhere with the HPX localities not mapping to the hardware the correct way (and hwloc sometimes throwing exception when running with --hpx:print-bind).
However, in this specific issue I just want to find out if I can address the overall problem the "manual" way by using the --hpx:bind parameter to bind the threads to the correct cores myself. However, the HPX documentation does not mention how that would work for multiple localtities per node.

So with that lengthy introduction out of the way, my actual questions are:

1.) Can --hpx:bind be used with multiple localities per node to bind the threads of the various localities to the correct cores?
2.) If yes, what's the syntax for that?

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions