Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

comm: re-implement dynamic processes using mpir-layer lpid #7240

Open
wants to merge 23 commits into
base: main
Choose a base branch
from

Conversation

hzhou
Copy link
Contributor

@hzhou hzhou commented Dec 18, 2024

Pull Request Description

Dependency PR: #7235 , #7237, #7242

Now both ch3 and ch4 can directly use comm->local_group and comm->remote_group (intercomm) to set up communicator and look up av addresses, we can remove the redundant code -

  • revamp ch4_spawn to use a temporary dynamic av to exchange info between group leaders and establish intercomm
  • remove ch4 MPIDI_rank_map_t
  • remove MPIR comm mapper

[skip warnings]

Author Checklist

  • Provide Description
    Particularly focus on why, not what. Reference background, issues, test failures, xfail entries, etc.
  • Commits Follow Good Practice
    Commits are self-contained and do not do two things at once.
    Commit message is of the form: module: short description
    Commit message explains what's in the commit.
  • Passes All Tests
    Whitespace checker. Warnings test. Additional tests via comments.
  • Contribution Agreement
    For non-Argonne authors, check contribution agreement.
    If necessary, request an explicit comment from your companies PR approval manager.

@hzhou hzhou force-pushed the 2412_dynamic_am branch 8 times, most recently from cc41cb6 to 69c6ec8 Compare December 23, 2024 20:03
@hzhou hzhou marked this pull request as ready for review December 23, 2024 20:09
@hzhou hzhou force-pushed the 2412_dynamic_am branch 8 times, most recently from 303ae94 to 54d2544 Compare December 26, 2024 14:23
@hzhou
Copy link
Contributor Author

hzhou commented Dec 26, 2024

test:mpich/ch3/most
test:mpich/ch4/most

All ✔️

@hzhou hzhou requested a review from yfguo January 2, 2025 20:01
@yfguo yfguo mentioned this pull request Jan 15, 2025
4 tasks
@@ -81,7 +80,6 @@ int MPIR_Group_init(void)
pmap->use_map = false;
pmap->u.stride.offset = MPIR_Process.rank;
pmap->u.stride.stride = 1;
pmap->u.stride.blocksize = 1;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, strided block is pretty rare. Only HACC and SNAP have them (through cartesian communicators).

@hzhou hzhou force-pushed the 2412_dynamic_am branch 2 times, most recently from 549edda to b3f547b Compare February 25, 2025 17:56
@hzhou hzhou force-pushed the 2412_dynamic_am branch 3 times, most recently from 4e760a4 to 2a24c6b Compare March 10, 2025 13:34
@hzhou
Copy link
Contributor Author

hzhou commented Mar 10, 2025

test:mpich/ch3/most
test:mpich/ch4/most
✔️

@hzhou hzhou requested a review from yfguo March 10, 2025 17:07
hzhou added 2 commits March 10, 2025 15:59
Because we need access MPIR_Lpid definitions in mpidpre headers, we need
move worlds and lpid definitions to device-independent headers.

Add macro MPIR_LPID_INVALID.

Make MPIR_Lpid signed. Since we are going to perform arithmetic on
MPIR_Lpid, e.g. in using strided pmap, make MPIR_Lpid int64_t instead of
uint64_t to avoid accidental conversion errors.
* Add check_map_is_strided to detect strided pattern and convert a map into a
strided pmap.

* In MPIR_Group_check_subset, use MPIR_Group_lpid_to_rank rather than a
manual linear search.

* Move internal static routines to the bottom of grouputil.c.
hzhou added 15 commits March 10, 2025 15:59
A strided group with nontrivial blocksize is rare. By removing the
blocksize parameter (i.e. blocksize is always 1), we greatly simplify
the code and also improve the performance of lpid lookup in a more
common strided group (such as a typical comm_world group or node group).
The pmap is always used inside MPIR_Group, and its size is always the
same as group->size. Having a duplicated field creates more opportunities
for bugs from inconsistency.
Replace MPIR_Assert with better error message.
We'll create av tables in ch4 according to world_idx and world_rank.
MPIDIU_lpid_to_av can look up the av entry from an lpid in the
communication path. MPIDIU_lpid_to_av_slow, used in communicator
creation paths, will check and allocate the corresponding av table as
needed.
Dynamic av will be used to support MPID_Comm_connect/accept when we need
to create the leader av before we know the correct lpid entries. They
are expected to be freed at the end of inter communicator creation.
Add -

* MPIDI_NM_insert_upid - insert an av entry so the lpid is ready for
communication. The lpid can be allocated from a dynamic av table, thus
supports temporary communications between intercomm leaders. When later
the upid is inserted again into the regular av tables, the dynamic
entries are checked and copied over if already exist.

* MPIDI_NM_dynamic_sendrecv - used by local group leaders to exchange
data over dynamic_av. The dynamic handshakes are susceptible to
concurrent interference. Thus the upper layer is assumed to hold the
vci-0 critical section.
We can easily exchange the context_id along with the rest of the remote
info rather than do it in a separate step.

We can determine is_low_group by comparing world namespace and
world_rank entirely in the MPIR layer, thus no longer need it in
MPID_Intercomm_exchange.

Rename MPID_Intercomm_exchange_map to MPID_Intercomm_exchange to better
reflect that it is not just exchanging maps.
This is fully replaced with MPIR_comm_rank_to_lpid or
MPIR_Group_rank_to_lpid.
Refactor MPID_Intercomm_exchange to Maximize common parts for
MPI_Intercomm_create, MPI_Comm_connect/accept, and
MPI_Intercomm_create_from_group. They differ in the first step
in how to establish a leader-to-leader communication. In ch4,
this is to establish an av for remote leader. Once the av is
established, the intercomm exchange parts are common.

We no longer generate lpid from ch4-layer. Rather, we exchange world
information and convert lpids by swapping world_idx. The lpids will be
used directly as index to ch4 av tables and upids (address names) are
inserted into the av table entries.
In MPID_Comm_connect/accept, simply establish remote_lpid and call
MPIR_Intercomm_create_impl.
The local_group and remote_group fully captures the mapper functions.
We have switched to use MPIR_Lpid to address in ch4 av table manager.
Both map and local_map in ch4 MPIDI_Devcomm_t no longer needed.
Rename it to MPIDIU_get_grank, remove the dependency on
MPIDIU_comm_rank_to_lpid (to be removed next) and use
MPIR_comm_rank_to_lpid instead.
@hzhou hzhou force-pushed the 2412_dynamic_am branch 3 times, most recently from 69b146d to 6341d97 Compare March 11, 2025 02:55
hzhou added 6 commits March 11, 2025 08:04
This is fully replaced by MPIR_comm_rank_to_lpid.
Track MPIR_Lpid lpid rather than a pair of (avtid, lpid).
Now we use MPIR_Lpid, we no longer needed netmod api to convert upids to
lpids. The function is replaced by netmod api insert_upid.
We no longer expose avtid. Replace MPIDIU_get_av with MPIDIU_lpid_to_av.

Also remote unused GPID macros.
When ch4-layer allocates an av table, all entries are initialized to 0.
However, 0 can be a valid entry for fi_addr_t. We could initialize all
entries to FI_ADDR_NOTAVAIL, but that requires an additional complexity
of a netmod API. Instead, because the entry 0 is always the first entry
to be inserted by fi_av_insert, we can simply remember the entry
(MPIDI_OFI_global.lpid0) and be able to tell which entries are empty (in
MPIDI_OFI_insert_upid).
We don't really tracek av tables' ref_count. We simply free all av
tables at finalize.

Rename MPIDIU_avt_destroy to MPIDIU_avt_finalize to better reflect its
role.
@hzhou hzhou force-pushed the 2412_dynamic_am branch from 6341d97 to cfd5d96 Compare March 11, 2025 13:04
@hzhou
Copy link
Contributor Author

hzhou commented Mar 11, 2025

test:mpich/ch3/most
test:mpich/ch4/most

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants