-
Notifications
You must be signed in to change notification settings - Fork 6
Description
Hi,
As the title suggests, I am struggling to run ROMBUS in parallel. Using the PhenomP.py model and PhenomP_samples.csv samples in rombus/models/, I run the following command on the head node of the OzStar Ngarrgu Tindebeek (NT) cluster:
mpirun rombus build PhenomP:Model PhenomP_samples.csv
And get the following error when ROMBUS runs the greedy algorithm to fill the reduced basis:
File "/home/abaker/.conda/envs/fresh-rombus/lib/python3.11/site-packages/rombus/cli.py", line 99, in build
ROM = ReducedOrderModel(model_loaded, samples).build(do_step=do_step)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/abaker/.conda/envs/fresh-rombus/lib/python3.11/site-packages/rombus/_core/log/log.py", line 253, in wrapper
r = func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/home/abaker/.conda/envs/fresh-rombus/lib/python3.11/site-packages/rombus/rom.py", line 113, in build
self.reduced_basis = ReducedBasis().compute(
File "/home/abaker/.conda/envs/fresh-rombus/lib/python3.11/site-packages/rombus/reduced_basis.py", line 114, in compute
with log.progress(
File "/home/abaker/.conda/envs/fresh-rombus/lib/python3.11/site-packages/rombus/_core/log/log.py", line 363, in __exit__
self._exception_handler(exc_val)
File "/fred/oz209/abaker/.conda/envs/rombus/src/rombus/python/rombus/reduced_basis.py", line 151, in compute
basis_index = self._convert_to_basis_index(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/fred/oz209/abaker/.conda/envs/rombus/src/rombus/python/rombus/reduced_basis.py", line 275, in _convert_to_basis_index
idx_till_err_rank = np.sum([rank_count[i] for i in ranks_till_err_rank])
IndexError: invalid index to scalar variable.
When instead requesting 16 CPUs on a single submit node and running the above command without mpirun, i.e. with the submit file
#!/bin/bash
#SBATCH --job-name=test_rombus
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=16
#SBATCH --time=0:30:00
#SBATCH --mem-per-cpu=4000
#SBATCH --output=pipe_submit.log
ml mamba && conda activate fresh-rombus
rombus build PhenomP:Model PhenomP_samples.csv
It completes successfully but appears to just run in serial based on the output in the log file. However, I get the following warning:
MPI startup(): PMI server not found. Please set I_MPI_PMI_LIBRARY variable if it is not a singleton case.
But I am not sure how to interpret this with respect to whether ROMBUS parallelised or not. Would anyone be able to help guide me in the right direction?
Thanks in advance!