MMA-optimized batched MRHS MG #1540

maddyscientist · 2025-03-11T17:32:42Z

This PR focuses on improving the use of MMA with batched multigrid, resulting in the optimal MRHS multigrid performance:

Make the mma-ordered version of the null space vectors persistent to avoid needing to reorder on the fly
Full coarse-grid correction can now be done in MMA order, minimizing the reordering overhead
- This is enabled on a per-level basis using QudaMultigridParam::collapse_mrhs (--mg-collapse-mrhs from the command line)
- Doing so collapse the MRHS solve to a single solve on the coarse level in question
- Enabling this requires that both the transfer operation into that grid and the dslash operator for that grid are deployed using MMA
Improve memory utilization: if pre- and post- smoother are identical, they can alias, avoiding needing to be allocated separately
Various boiler-plate improvements to facilitate the above, e.g.,
- input and/or output for transfer operators can be collapsed
- new method vector_ref<ColorSpinorField>::size_actual() to allow us to query the number of RHS in a collapsed system
Improve usability of FieldTmp
- Temporaries can now be references, which helps with composability
BlockTranspose input field set can now be in half precision

…r operators; delay enablement of MMA in transfer until after coarse operator is computed to avoid double storage of V in MMA order; only store V in native or MMA order to reduce memory

…rn_residual, use_init_guess. These can now be updated using the Solver::update_param interface

…mer for the latter to reduce memory consumption

… This is helpful for unform object creation.

…lied using tensor cores

…ngle MMA-ordered super-system: prolongator, restrictor and dslash coarse will now not reorder if the input / output fields are already in the correct order

…compatible

…ith a ColorSpinorField; deploy this to check if we need to nuke any preexisting allications when resizing std::vector<ColorSpinorField>

…eferences around the input (useful for creating a uniform code path between native and MMA ordered solvers)

…llapsed

…ew parameter QudaMultigridParam::collapse_mrhs enables this on a per level basis, with MMA enablement for both dslash and transfer operator (on the finer level) required; deflation is presently handled by expanding the collapsed space to the batch form, and then collapsing again post deflation. This removes all the BlockTranspose operations outside of the initial coarsening and final prolongation (deflation excepting).

…this fixes the verify routine with MMA transfer routines when using half precision

…-misc

maddyscientist added 17 commits February 19, 2025 08:41

Optimize MMA MG: make a persistent copy of V in MMA order the transfe…

e158e8c

…r operators; delay enablement of MMA in transfer until after coarse operator is computed to avoid double storage of V in MMA order; only store V in native or MMA order to reduce memory

Solver class now owns copies of the parameters compute_true_res, retu…

c704ef9

…rn_residual, use_init_guess. These can now be updated using the Solver::update_param interface

If pre smoother and post smoother are identical, we can reuse the for…

1beaafa

…mer for the latter to reduce memory consumption

Add ColorSpinorField::is_reference() method

e12a0f2

Add support to FieldTmp for creating temporaries that are references.…

db0ceaa

… This is helpful for unform object creation.

Add Dirac::is_mma_enabled for querying if a Dirac operator can be app…

a8c8ede

…lied using tensor cores

Build up of the framework to allow us to turn a MRHS system into a si…

ac30902

…ngle MMA-ordered super-system: prolongator, restrictor and dslash coarse will now not reorder if the input / output fields are already in the correct order

Implement color_spinor_copy as a simple copy if input and output are …

89a6199

…compatible

Add some functions for checking if a ColorSpinorField is compatible w…

fa2b177

…ith a ColorSpinorField; deploy this to check if we need to nuke any preexisting allications when resizing std::vector<ColorSpinorField>

Move PreconditionedSolver::operator() implementation tp solver.cpp

cb9ade0

Vector version of getFieldTmp can now optionally create a vector of r…

01326f2

…eferences around the input (useful for creating a uniform code path between native and MMA ordered solvers)

Fix minor bug with ColorSpinorField::move

b95a3c3

Improve verbosity of quda_ptr error printing

2b4e3af

Coarse grid argument for prolongator and restrictor now can be pre-co…

93cd741

…llapsed

Fix some bugs

fabe2c4

Add half precision support for input vector set for BlockTranspose - …

Loading
Loading status checks…

9ea3163

…this fixes the verify routine with MMA transfer routines when using half precision

maddyscientist added bug feature optimization labels Mar 11, 2025

maddyscientist added this to the QUDA 2.0 milestone Mar 11, 2025

maddyscientist assigned weinbe2 Mar 11, 2025

maddyscientist requested a review from a team as a code owner March 11, 2025 17:32

maddyscientist assigned hummingtree Mar 11, 2025

maddyscientist changed the title ~~Feature/mg mma misc~~ MMA-optimized batched MRHS MG Mar 11, 2025

maddyscientist added 2 commits March 11, 2025 12:38

Fix CI warning

Loading
Loading status checks…

2bb7f5d

Merge branch 'develop' of github.com:lattice/quda into feature/mg-mma…

Loading
Loading status checks…

57859a0

…-misc

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MMA-optimized batched MRHS MG #1540

MMA-optimized batched MRHS MG #1540

maddyscientist commented Mar 11, 2025

MMA-optimized batched MRHS MG #1540

Are you sure you want to change the base?

MMA-optimized batched MRHS MG #1540

Conversation

maddyscientist commented Mar 11, 2025