Skip to content

Struggling to run ROMBUS in parallel. #35

@amakaibaker

Description

@amakaibaker

Hi,

As the title suggests, I am struggling to run ROMBUS in parallel. Using the PhenomP.py model and PhenomP_samples.csv samples in rombus/models/, I run the following command on the head node of the OzStar Ngarrgu Tindebeek (NT) cluster:

mpirun rombus build PhenomP:Model PhenomP_samples.csv

And get the following error when ROMBUS runs the greedy algorithm to fill the reduced basis:

File "/home/abaker/.conda/envs/fresh-rombus/lib/python3.11/site-packages/rombus/cli.py", line 99, in build
    ROM = ReducedOrderModel(model_loaded, samples).build(do_step=do_step)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/abaker/.conda/envs/fresh-rombus/lib/python3.11/site-packages/rombus/_core/log/log.py", line 253, in wrapper
    r = func(*args, **kwargs)
        ^^^^^^^^^^^^^^^^^^^^^
  File "/home/abaker/.conda/envs/fresh-rombus/lib/python3.11/site-packages/rombus/rom.py", line 113, in build
    self.reduced_basis = ReducedBasis().compute(
   File "/home/abaker/.conda/envs/fresh-rombus/lib/python3.11/site-packages/rombus/reduced_basis.py", line 114, in compute
    with log.progress(
  File "/home/abaker/.conda/envs/fresh-rombus/lib/python3.11/site-packages/rombus/_core/log/log.py", line 363, in __exit__
    self._exception_handler(exc_val)
File "/fred/oz209/abaker/.conda/envs/rombus/src/rombus/python/rombus/reduced_basis.py", line 151, in compute
    basis_index = self._convert_to_basis_index(
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/fred/oz209/abaker/.conda/envs/rombus/src/rombus/python/rombus/reduced_basis.py", line 275, in _convert_to_basis_index
    idx_till_err_rank = np.sum([rank_count[i] for i in ranks_till_err_rank])
    
   IndexError: invalid index to scalar variable.

When instead requesting 16 CPUs on a single submit node and running the above command without mpirun, i.e. with the submit file

#!/bin/bash
#SBATCH --job-name=test_rombus
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=16
#SBATCH --time=0:30:00
#SBATCH --mem-per-cpu=4000
#SBATCH --output=pipe_submit.log

ml mamba && conda activate fresh-rombus

rombus build PhenomP:Model PhenomP_samples.csv

It completes successfully but appears to just run in serial based on the output in the log file. However, I get the following warning:

MPI startup(): PMI server not found. Please set I_MPI_PMI_LIBRARY variable if it is not a singleton case.

But I am not sure how to interpret this with respect to whether ROMBUS parallelised or not. Would anyone be able to help guide me in the right direction?

Thanks in advance!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions