-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
common/ucx: New UCX common folder to be shared among components #1
base: master
Are you sure you want to change the base?
Conversation
Why wouldn't you want this upstream in mainline Open MPI? |
@jsquyres |
@alex--m continuing the discussion from open-mpi#8484 (comment) - |
also, can you pls explain why sharing freelists, datatypes, etc. is needed for reducing # of created QPs |
Certainly. Lastly, since we mentioned it in the conf. call it's only fair I also post it here for reference: in order for UCC (the MCA component as well as the library itself) to use these objects (worker, datatype convertors, free-lists, etc.) - we need to agree on this (or another) shared folder structure. Only then can we proceed to implement UCC on top of it. |
UCC has its own datatype specification: https://github.com/openucx/ucc/blob/master/src/ucc/api/ucc.h#L209 which is not a UCX datatype. Do you expect UCC API will also contain UCX datatype (ucp_datatype_t)? |
Currently, because UCX non-contiguous datatype handling is contained in PML anyway - It didn't make sense to reuse ucp_datatype_t, at least to me. To be fair, the other reason for not re-using is that UCC requires more information to do the reduction, but I argue that can be addressed (e.g. by splitting UCP's contiguous type into "reduction type" + length, or adding a new "contiguous-with-type" value to the enum - even on UCC level). If indeed we move the non-contiguous datatype handling into the common folder - it would make much more sense for UCC to (re)use this code. |
@alex--m UCC is a standalone library, which will be probably used by many MPI libraries, not just OpenMPI. Therefore, it does not make sense that UCC API/implementation can depend in some way on OpenMPI's code or its folder structure. |
As UCC API is still a work in progress, and my opinion as a UCC working-group member depends on the outcome of this PR, I'm afraid a clear answer cannot be given before this PR is decided on. |
Can you please share the plan for UCC datatype API, according to the current code in this PR? |
Sure, my plan is very simple: pass a datatype convertor callback as a hint to UCC (similarly to the way we plan to pass the UCP worker), and call it from within UCC. For example, when MPI_Bcast() is called with a non-contiguous datatype, UCC could (indirectly) call The plan doesn't necessarily involve explicitly mentioning |
In this case, why need to share code with UCX component? Just pass an opal-convertor based callback. There are many components in OpenMPI which pass around convertors, and they never needed to share code with UCX PML.. |
The problems I see with what you are proposing are that (a) the convertor contexts for PML and COLL would be disjoint, causing the same MPI datatype to be translated into Is there a disadvantage with this proposal to combine them? |
So UCC would have to understand the object it gets in its API calls (either as a ucp_datatype_t or void*) is actually a UCP datatype handle, and pass it down to UCX p2p layer internally? |
My plan was to use a custom datatype (void*, so it's not MPI-specific). In addition, I was thinking UCC would also get an optional parameter like
Well, the entire discussion started because | { COLL/UCC, PML/UCX } | > 1 - so I don't see a problem here. |
So, we need to get a consensus about this UCC API approach from the broader UCC community, and merge the API to UCC library, before upstreaming relevant changes to OpenMPI (To make sure coll/ucx component could actually be implemented the way you described). |
6a23989
to
bff788a
Compare
Signed-off-by: Alex Margolin <[email protected]>
bff788a
to
db0daf9
Compare
No description provided.