--hpx:bind syntax for multiple localities per node (required for Aurora experiments)

I am trying to run Octo-Tiger on Aurora (so this issue is related to the discussion #6713 ).

The recommended hardware bindings on Aurora are a bit tricky: We should run with 12 processes per compute node, each process using 8 cores and one GPU tile. The first core on either of the two sockets should not be used. Neither should the last three 3 cores on each socket be used (52 cores per socket).

Now, I got the current version of Octo-Tiger sort of working with HPX 1.11.0, Kokkos 4.7.04 (albeit that one needed some patching) and LCI 1.7.9 on Aurora. When I say sort of, I mean it can run in distributed settings just fine and the HPX-SYCL GPU support works. However, what does not work properly is the hardware binding as described above. When I run with 12 localities per compute node I often (nondeterministically) encounter situations where two or more HPX localities end up using the same CPU cores, leading to rather bad performance in these cases.

For more context: On Aurora we have to use PBS and I run Octo-Tiger via mpiexec with the cpu-bind parameter, for example like this:
```
mpiexec -n 12 -ppn 12 --cpu-bind=list:1-8:9-16:17-24:25-32:33-40:41-48:53-60:61-68:69-76:77-84:85-92:93-100 --mem-bind=list:2:2:2:2:2:2:3:3:3:3:3:3 map_processes_to_GPU_tiles.sh $OCTOTIGER_APP_PATH $OCTOTIGER_OPTIONS_GPU_KOKKOS --hpx:ignore-batch-env --hpx:threads=8 --hpx:use-process-mask  --hpx:print-bind --hpx:ini=hpx.parcel.lci.priority=1000 --hpx:ini=hpx.parcel.lci.enable=1 --hpx:nodefile=$PBS_NODEFILE
```
(I tried various permutations of that with/without the ```ignore-bath-env``` and ```use-process-mask``` and even with/without the cpu-bind parameter without any more success -- I still run into HPX localities sharing the same CPU cores).

There is probably a separate issue here somewhere with the HPX localities not mapping to the hardware the correct way (and hwloc sometimes throwing exception when running with ```--hpx:print-bind```). 
However, in this specific issue I just want to find out if I can address the overall problem the "manual" way by using the ```--hpx:bind``` parameter to bind the threads to the correct cores myself. However, the  [HPX documentation](https://docs.hpx.dev/latest/html/manual/launching_and_configuring_hpx_applications.html#details)  does not mention how that would work for multiple localtities per node.

So with that lengthy introduction out of the way, my actual questions are:

1.) Can ```--hpx:bind``` be used with multiple localities per node to bind the threads of the various localities to the correct cores?
2.) If yes, what's the syntax for that? 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

--hpx:bind syntax for multiple localities per node (required for Aurora experiments) #7265

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

--hpx:bind syntax for multiple localities per node (required for Aurora experiments) #7265

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions