Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MMA-optimized batched MRHS MG #1540

Open
wants to merge 19 commits into
base: develop
Choose a base branch
from
Open

Conversation

maddyscientist
Copy link
Member

This PR focuses on improving the use of MMA with batched multigrid, resulting in the optimal MRHS multigrid performance:

  • Make the mma-ordered version of the null space vectors persistent to avoid needing to reorder on the fly
  • Full coarse-grid correction can now be done in MMA order, minimizing the reordering overhead
    • This is enabled on a per-level basis using QudaMultigridParam::collapse_mrhs (--mg-collapse-mrhs from the command line)
    • Doing so collapse the MRHS solve to a single solve on the coarse level in question
    • Enabling this requires that both the transfer operation into that grid and the dslash operator for that grid are deployed using MMA
  • Improve memory utilization: if pre- and post- smoother are identical, they can alias, avoiding needing to be allocated separately
  • Various boiler-plate improvements to facilitate the above, e.g.,
    • input and/or output for transfer operators can be collapsed
    • new method vector_ref<ColorSpinorField>::size_actual() to allow us to query the number of RHS in a collapsed system
  • Improve usability of FieldTmp
    • Temporaries can now be references, which helps with composability
  • BlockTranspose input field set can now be in half precision

…r operators; delay enablement of MMA in transfer until after coarse operator is computed to avoid double storage of V in MMA order; only store V in native or MMA order to reduce memory
…rn_residual, use_init_guess. These can now be updated using the Solver::update_param interface
…mer for the latter to reduce memory consumption
… This is helpful for unform object creation.
…ngle MMA-ordered super-system: prolongator, restrictor and dslash coarse will now not reorder if the input / output fields are already in the correct order
…ith a ColorSpinorField; deploy this to check if we need to nuke any preexisting allications when resizing std::vector<ColorSpinorField>
…eferences around the input (useful for creating a uniform code path between native and MMA ordered solvers)
…ew parameter QudaMultigridParam::collapse_mrhs enables this on a per level basis, with MMA enablement for both dslash and transfer operator (on the finer level) required; deflation is presently handled by expanding the collapsed space to the batch form, and then collapsing again post deflation. This removes all the BlockTranspose operations outside of the initial coarsening and final prolongation (deflation excepting).
…this fixes the verify routine with MMA transfer routines when using half precision
@maddyscientist maddyscientist added this to the QUDA 2.0 milestone Mar 11, 2025
@maddyscientist maddyscientist requested a review from a team as a code owner March 11, 2025 17:32
@maddyscientist maddyscientist changed the title Feature/mg mma misc MMA-optimized batched MRHS MG Mar 11, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants