Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ch4/ipc: Simplify generic ipc caches #7320

Draft
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

raffenet
Copy link
Contributor

@raffenet raffenet commented Mar 5, 2025

Pull Request Description

Replace both AVL tree hashes used in GPU IPC. The sender-side GPU IPC handle cache and the receiver side mapped buffer cache can be implemented using hashable structures, which makes the resulting code much easier to understand and maintain.

Author Checklist

  • Provide Description
    Particularly focus on why, not what. Reference background, issues, test failures, xfail entries, etc.
  • Commits Follow Good Practice
    Commits are self-contained and do not do two things at once.
    Commit message is of the form: module: short description
    Commit message explains what's in the commit.
  • Passes All Tests
    Whitespace checker. Warnings test. Additional tests via comments.
  • Contribution Agreement
    For non-Argonne authors, check contribution agreement.
    If necessary, request an explicit comment from your companies PR approval manager.

@raffenet
Copy link
Contributor Author

raffenet commented Mar 5, 2025

test:mpich/ch4/ofi/gpu

@raffenet raffenet force-pushed the new-mapped-ipc-cache branch from fe953d9 to d39ff45 Compare March 5, 2025 17:18
@raffenet
Copy link
Contributor Author

raffenet commented Mar 5, 2025

test:mpich/ch4/gpu

@raffenet raffenet force-pushed the new-mapped-ipc-cache branch from d39ff45 to 1baf24d Compare March 6, 2025 18:33
@raffenet raffenet changed the title ch4/ipc: Replace mapped gpu ipc handle cache ch4/ipc: Simplify generic ipc caches Mar 6, 2025
@raffenet
Copy link
Contributor Author

raffenet commented Mar 6, 2025

test:mpich/ch4/gpu

@raffenet raffenet force-pushed the new-mapped-ipc-cache branch from 1baf24d to b13ac5f Compare March 11, 2025 21:16
Remove mapped buffer cache using AVL trees and replace with a hash
table. The key for the hash is the combination of the sender base
address and their local rank on the node. Each entry can store a mapped
address for each device visible to the process doing the mapping.
Replace AVL tree structures with hash keyed on the allocation base
pointer.
IPC send is synchronous by definition. The sender needs to be notified
when the receiver is done reading from the send buffer to satisfy
completion semantics. The syncflag parameter is redundant.
At send completion time, give the GPU layer a chance to cleanup
resources used to perform the IPC transfer. This does not affect the
default configuration with caching enabled.
@raffenet raffenet force-pushed the new-mapped-ipc-cache branch from b13ac5f to 2003155 Compare March 14, 2025 19:59
Avoid masking any errors here and also squash an unused label warning.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant