Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

common/ucx: New UCX common folder to be shared among components #1

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

alex--m
Copy link

@alex--m alex--m commented Feb 15, 2021

No description provided.

@jsquyres
Copy link

Why wouldn't you want this upstream in mainline Open MPI?

@alex--m
Copy link
Author

alex--m commented Feb 15, 2021

Why wouldn't you want this upstream in mainline Open MPI?

@jsquyres
I would, but during the last developer conf. call we decided to review it internally first, and only after we reach consensus here I'd create a PR upstream on behalf of the UCX community. In fact, this entire repo was created just the other week for this process.

@yosefe
Copy link

yosefe commented Feb 17, 2021

@alex--m continuing the discussion from open-mpi#8484 (comment) -
What is the minimal set of changes required to be able to share ucp objects?
According to 43ec4cb, only opal_common_ucx.ucp_worker is actually being used.

@yosefe
Copy link

yosefe commented Feb 17, 2021

also, can you pls explain why sharing freelists, datatypes, etc. is needed for reducing # of created QPs

@alex--m
Copy link
Author

alex--m commented Feb 20, 2021

also, can you pls explain why sharing freelists, datatypes, etc. is needed for reducing # of created QPs

Certainly.
A comprehensive collectives component is required to handle non-contiguous/derived (MPI) datatypes, and this includes the component I'm working on for UCC. To that end, it would need to convert MPI datatypes into UCX datatypes - so it could use UCX's P2P APIs to get the data across. The necessary flow for this is identical to the one employed by PML/UCX - so this patch moves to the shared folder EVERYTHING the collective component would use. Similarly, free-lists would be used to store collective operation requests in the same way it does P2P requests today.

Lastly, since we mentioned it in the conf. call it's only fair I also post it here for reference: in order for UCC (the MCA component as well as the library itself) to use these objects (worker, datatype convertors, free-lists, etc.) - we need to agree on this (or another) shared folder structure. Only then can we proceed to implement UCC on top of it.

@yosefe
Copy link

yosefe commented Feb 21, 2021

A comprehensive collectives component is required to handle non-contiguous/derived (MPI) datatypes, and this includes the component I'm working on for UCC. To that end, it would need to convert MPI datatypes into UCX datatypes

UCC has its own datatype specification: https://github.com/openucx/ucc/blob/master/src/ucc/api/ucc.h#L209 which is not a UCX datatype. Do you expect UCC API will also contain UCX datatype (ucp_datatype_t)?

@alex--m
Copy link
Author

alex--m commented Feb 21, 2021

UCC has its own datatype specification: https://github.com/openucx/ucc/blob/master/src/ucc/api/ucc.h#L209 which is not a UCX datatype. Do you expect UCC API will also contain UCX datatype (ucp_datatype_t)?

Currently, because UCX non-contiguous datatype handling is contained in PML anyway - It didn't make sense to reuse ucp_datatype_t, at least to me. To be fair, the other reason for not re-using is that UCC requires more information to do the reduction, but I argue that can be addressed (e.g. by splitting UCP's contiguous type into "reduction type" + length, or adding a new "contiguous-with-type" value to the enum - even on UCC level).

If indeed we move the non-contiguous datatype handling into the common folder - it would make much more sense for UCC to (re)use this code.

@yosefe
Copy link

yosefe commented Feb 21, 2021

@alex--m UCC is a standalone library, which will be probably used by many MPI libraries, not just OpenMPI. Therefore, it does not make sense that UCC API/implementation can depend in some way on OpenMPI's code or its folder structure.
We need to know if UCC API is actually going to have a ucp_datatype_t API, because it doesn't make sense to make code changes in OpenMPI in order to use an API which doesn't and/or not going to exist.
From the response above, I could not get a clear answer for the question "Do you expect UCC API will also contain UCX datatype (ucp_datatype_t)?"

@alex--m
Copy link
Author

alex--m commented Feb 21, 2021

From the response above, I could not get a clear answer for the question "Do you expect UCC API will also contain UCX datatype (ucp_datatype_t)?"

As UCC API is still a work in progress, and my opinion as a UCC working-group member depends on the outcome of this PR, I'm afraid a clear answer cannot be given before this PR is decided on.

@yosefe
Copy link

yosefe commented Feb 21, 2021

Can you please share the plan for UCC datatype API, according to the current code in this PR?
We need to have an understanding of the big picture and how all parts fit together.

@alex--m
Copy link
Author

alex--m commented Feb 21, 2021

Can you please share the plan for UCC datatype API, according to the current code in this PR?
We need to have an understanding of the big picture and how all parts fit together.

Sure, my plan is very simple: pass a datatype convertor callback as a hint to UCC (similarly to the way we plan to pass the UCP worker), and call it from within UCC. For example, when MPI_Bcast() is called with a non-contiguous datatype, UCC could (indirectly) call mca_common_ucx_init_datatype() and use the resulting ucp_datatype_t to pack the data before sending.

The plan doesn't necessarily involve explicitly mentioning ucp_datatype_t in the UCC API, but like I said - the actual implementation will depend on the outcome of this PR. My plans for persistent requests and free-lists is similar.

@yosefe
Copy link

yosefe commented Feb 21, 2021

Sure, my plan is very simple: pass a datatype convertor callback as a hint to UCC (similarly to the way we plan to pass the UCP worker), and call it from within UCC. For example, when MPI_Bcast() is called with a non-contiguous datatype, UCC could (indirectly) call mca_common_ucx_init_datatype() and use the resulting ucp_datatype_t to pack the data before sending.

In this case, why need to share code with UCX component? Just pass an opal-convertor based callback. There are many components in OpenMPI which pass around convertors, and they never needed to share code with UCX PML..
Same goes for freelist - OPAL has a variety of freelist structures which can be used.

@alex--m
Copy link
Author

alex--m commented Feb 21, 2021

In this case, why need to share code with UCX component? Just pass an opal-convertor based callback. There are many components in OpenMPI which pass around convertors, and they never needed to share code with UCX PML..
Same goes for freelist - OPAL has a variety of freelist structures which can be used.

The problems I see with what you are proposing are that (a) the convertor contexts for PML and COLL would be disjoint, causing the same MPI datatype to be translated into ucp_datatype_t twice and stored twice, and (b) almost identical code would be duplicated across these two MCA components.

Is there a disadvantage with this proposal to combine them?

@yosefe
Copy link

yosefe commented Feb 21, 2021

So UCC would have to understand the object it gets in its API calls (either as a ucp_datatype_t or void*) is actually a UCP datatype handle, and pass it down to UCX p2p layer internally?
Re "Is there a disadvantage" - This is breaking components encapsulation so need a solid use-case. We don't want to move code to a shared place without it actually being used by >1 component.

@alex--m
Copy link
Author

alex--m commented Feb 21, 2021

So UCC would have to understand the object it gets in its API calls (either as a ucp_datatype_t or void*) is actually a UCP datatype handle, and pass it down to UCX p2p layer internally?

My plan was to use a custom datatype (void*, so it's not MPI-specific). In addition, I was thinking UCC would also get an optional parameter like convert_callback_function(ompi_datatype_t *in, ucp_datatype_t *out), calling that shared common_ucx code, and the resulting UCP datatype would be passed to UCP P2P (which is already used inside UCC).

Re "Is there a disadvantage" - This is breaking components encapsulation so need a solid use-case. We don't want to move code to a shared place without it actually being used by >1 component.

Well, the entire discussion started because | { COLL/UCC, PML/UCX } | > 1 - so I don't see a problem here.

@yosefe
Copy link

yosefe commented Feb 21, 2021

My plan was to use a custom datatype (void*, so it's not MPI-specific). In addition, I was thinking UCC would also get an optional parameter like convert_callback_function(ompi_datatype_t *in, ucp_datatype_t *out), calling that shared common_ucx code, and the resulting UCP datatype would be passed to UCP P2P (which is already used inside UCC).

So, we need to get a consensus about this UCC API approach from the broader UCC community, and merge the API to UCC library, before upstreaming relevant changes to OpenMPI (To make sure coll/ucx component could actually be implemented the way you described).
For now, we can look at this as WIP.

@alex--m alex--m force-pushed the topic/shared-ucx-no-collectives branch 3 times, most recently from 6a23989 to bff788a Compare May 6, 2021 13:39
@alex--m alex--m force-pushed the topic/shared-ucx-no-collectives branch from bff788a to db0daf9 Compare May 6, 2021 14:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants