-
Notifications
You must be signed in to change notification settings - Fork 289
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
comm: re-implement dynamic processes using mpir-layer lpid #7240
Open
hzhou
wants to merge
23
commits into
pmodels:main
Choose a base branch
from
hzhou:2412_dynamic_am
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
cc41cb6
to
69c6ec8
Compare
303ae94
to
54d2544
Compare
test:mpich/ch3/most All ✔️ |
yfguo
reviewed
Jan 23, 2025
@@ -81,7 +80,6 @@ int MPIR_Group_init(void) | |||
pmap->use_map = false; | |||
pmap->u.stride.offset = MPIR_Process.rank; | |||
pmap->u.stride.stride = 1; | |||
pmap->u.stride.blocksize = 1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, strided block is pretty rare. Only HACC and SNAP have them (through cartesian communicators).
549edda
to
b3f547b
Compare
4e760a4
to
2a24c6b
Compare
test:mpich/ch3/most |
Because we need access MPIR_Lpid definitions in mpidpre headers, we need move worlds and lpid definitions to device-independent headers. Add macro MPIR_LPID_INVALID. Make MPIR_Lpid signed. Since we are going to perform arithmetic on MPIR_Lpid, e.g. in using strided pmap, make MPIR_Lpid int64_t instead of uint64_t to avoid accidental conversion errors.
* Add check_map_is_strided to detect strided pattern and convert a map into a strided pmap. * In MPIR_Group_check_subset, use MPIR_Group_lpid_to_rank rather than a manual linear search. * Move internal static routines to the bottom of grouputil.c.
A strided group with nontrivial blocksize is rare. By removing the blocksize parameter (i.e. blocksize is always 1), we greatly simplify the code and also improve the performance of lpid lookup in a more common strided group (such as a typical comm_world group or node group).
The pmap is always used inside MPIR_Group, and its size is always the same as group->size. Having a duplicated field creates more opportunities for bugs from inconsistency.
Replace MPIR_Assert with better error message.
We'll create av tables in ch4 according to world_idx and world_rank. MPIDIU_lpid_to_av can look up the av entry from an lpid in the communication path. MPIDIU_lpid_to_av_slow, used in communicator creation paths, will check and allocate the corresponding av table as needed.
Dynamic av will be used to support MPID_Comm_connect/accept when we need to create the leader av before we know the correct lpid entries. They are expected to be freed at the end of inter communicator creation.
Add - * MPIDI_NM_insert_upid - insert an av entry so the lpid is ready for communication. The lpid can be allocated from a dynamic av table, thus supports temporary communications between intercomm leaders. When later the upid is inserted again into the regular av tables, the dynamic entries are checked and copied over if already exist. * MPIDI_NM_dynamic_sendrecv - used by local group leaders to exchange data over dynamic_av. The dynamic handshakes are susceptible to concurrent interference. Thus the upper layer is assumed to hold the vci-0 critical section.
We can easily exchange the context_id along with the rest of the remote info rather than do it in a separate step. We can determine is_low_group by comparing world namespace and world_rank entirely in the MPIR layer, thus no longer need it in MPID_Intercomm_exchange. Rename MPID_Intercomm_exchange_map to MPID_Intercomm_exchange to better reflect that it is not just exchanging maps.
This is fully replaced with MPIR_comm_rank_to_lpid or MPIR_Group_rank_to_lpid.
Refactor MPID_Intercomm_exchange to Maximize common parts for MPI_Intercomm_create, MPI_Comm_connect/accept, and MPI_Intercomm_create_from_group. They differ in the first step in how to establish a leader-to-leader communication. In ch4, this is to establish an av for remote leader. Once the av is established, the intercomm exchange parts are common. We no longer generate lpid from ch4-layer. Rather, we exchange world information and convert lpids by swapping world_idx. The lpids will be used directly as index to ch4 av tables and upids (address names) are inserted into the av table entries.
In MPID_Comm_connect/accept, simply establish remote_lpid and call MPIR_Intercomm_create_impl.
The local_group and remote_group fully captures the mapper functions.
We have switched to use MPIR_Lpid to address in ch4 av table manager. Both map and local_map in ch4 MPIDI_Devcomm_t no longer needed.
Rename it to MPIDIU_get_grank, remove the dependency on MPIDIU_comm_rank_to_lpid (to be removed next) and use MPIR_comm_rank_to_lpid instead.
69b146d
to
6341d97
Compare
This is fully replaced by MPIR_comm_rank_to_lpid.
Track MPIR_Lpid lpid rather than a pair of (avtid, lpid).
Now we use MPIR_Lpid, we no longer needed netmod api to convert upids to lpids. The function is replaced by netmod api insert_upid.
We no longer expose avtid. Replace MPIDIU_get_av with MPIDIU_lpid_to_av. Also remote unused GPID macros.
When ch4-layer allocates an av table, all entries are initialized to 0. However, 0 can be a valid entry for fi_addr_t. We could initialize all entries to FI_ADDR_NOTAVAIL, but that requires an additional complexity of a netmod API. Instead, because the entry 0 is always the first entry to be inserted by fi_av_insert, we can simply remember the entry (MPIDI_OFI_global.lpid0) and be able to tell which entries are empty (in MPIDI_OFI_insert_upid).
We don't really tracek av tables' ref_count. We simply free all av tables at finalize. Rename MPIDIU_avt_destroy to MPIDIU_avt_finalize to better reflect its role.
test:mpich/ch3/most |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Pull Request Description
Dependency PR: #7235 , #7237, #7242
Now both ch3 and ch4 can directly use
comm->local_group
andcomm->remote_group
(intercomm) to set up communicator and look up av addresses, we can remove the redundant code -ch4_spawn
to use a temporary dynamic av to exchange info between group leaders and establish intercommMPIDI_rank_map_t
[skip warnings]
Author Checklist
Particularly focus on why, not what. Reference background, issues, test failures, xfail entries, etc.
Commits are self-contained and do not do two things at once.
Commit message is of the form:
module: short description
Commit message explains what's in the commit.
Whitespace checker. Warnings test. Additional tests via comments.
For non-Argonne authors, check contribution agreement.
If necessary, request an explicit comment from your companies PR approval manager.