Internal mutation of communicator or other MPI objects in relation to C++ const semantics #980

correaa · 2025-03-27T19:48:37Z

I am opening this issue after a lengthy discussion at a C++ MPI weekly meeting organized by @tonyskjellum and encouraged by the participants @sg0, Tim Uhl, and @EvanDrakeSuggs.

What I can write in this first post is simply a layout of the problem.
I expect this discussion to be long and full of subtleties as we go deeper.

The main issue that I propose to discuss is to see to what degree const-correctness in C++ can reflect fundamental aspects of MPI communication, efficient implementations of MPI, and common practice.

Background: It is central to the idea of C++ that the language provides a way to communicate aspects and guarantees under mutation, mainly in the form of the const attribute (and its sibling mutable), that adds beneficial semantic information to a program.

Historically, the const aspect of a function or a variable has been interpreted simply as saying that a particular operation or part of the program would leave the relevant object the same before and after an operation.
This changed dramatically in C++11.
The current interpretation of const became more stringent because it was found helpful to interpret const-ness not only as meaning that an object is left in the same state before and after an operation, but also "during" the operation.
This modern interpretation of const is driven by efficiency and maximizing the utility of this language feature.

When writing a C++ wrapper to MPI (or any C-interface that is not const-aware), adding the keyword const requires both a deep knowledge of the 1) interface (and its semantics), 2) the implementation (internal mutation) and 3) (the most difficult) fundamental understanding of what makes implementations efficient given the constraints, or the actual system, or even the underlying hardware that drives sound implementations.

Problem

The problem is that the MPI standard says little about the mutation of MPI "objects" in the MPI standard.
Objects can include: a) communicators object themselves, b) request objects, and (perhaps less interesting for this discussion) c) data being communicated.
In most cases, the internal mutation is implied by common knowledge.
However, during discussions, they are not well-known, not agreeing upon, or are interpreted as quality-of-implementation issues.

This uncertainty implies that a C++ interface will have to be very "defensive," leaving out performance on the table and not even able to exploit idiomatic C++.
In other words, for every little doubt we have, we will force ourselves to remove const keywords from many places in a C++ interface.

It is generally agreed that a C++ interface to MPI will have a communicator object.
If it exists, this is not simply the handle of a C interface but what the handle "points" to.
In other words, we want to deal with an object that exists in this form:

mpi3::communicator comm{...};

Given that, we found three prevalent simple scenarios that illustrate this point.
(Please don't concentrate on the proposed syntax; if they are member functions or free functions, for example; it is the semantics what matters).

what MPI functions using this communicator should be decorated with const?

Take the simple example of send.

comm.send(...);

Should send be declared as a const member?

class mpi3::communicator {
...
   auto send(...) const??? {...}
};

My claim is that, in a runtime environment (where it is not known at compile time whether the MPI is initialized or threaded), the send function shouldn't be const.
This surprised many because they said the communicator should be in the same state before and after sending a message.
My answer is that even if that is the case, it doesn't matter. If there is an internal change of the communicator during the send operation, even if it is a small cache (that is not guaranteed to be synchronized), the operation should not be const.
(This is without entering into the philosophical questions of whether the communicator is the "same" before and after sending.)

This is even more dramatic for immediate_send (assuming we want that in the interface, which is a separate question).

...
comm.immediate_send(...);

Here, my claim is that this member ::immediate_send shouldn't be const either because the communicator is in a different "state" after the immediate-send and it will have a pending operation.

Finally, the simple operation of communicator duplication implies that an eventual .duplicate() operation should not be const either.
The reason, and this empirical, is that it seems that the MPI_Duplicate modifies (at least temporarily) the state of the source communicator.
Among other things, this prevents the implementation of a C++ interface that has a communicator copy-constructor, which is something that is surprising.

class mpi3::communicator {
   communicator(communicator const&) = delete;  // unimplementable
   communicator duplicate() { ... }   // ok, but not it is not const

//  vvv--- more controversial (excuse the C++ jargon)
   communicator(communicator&& other) noexcept { ... };  // ok if we accept the communicator with a partially formed state, possibly NULL.
   /*explicit?*/ communicator(communicator& other) { ... }   // this is not a COPY constructor! (it is what I call a "duplicate" constructor. 
}

This cases are just the tip of the iceberg.

Proposal

These examples illustrate the surprising implications of the guarantees (or lack of) provided by the standard MPI.
Please note that, as C++ programmers, we are not "demanding" that implementation do one or the other thing so that we can use the const keyword everywhere.
The idea is for anyone developing C++ interfaces and using them, to faithfully reflect semantics and implementation mutations on the MPI objects.

Changes to the Text

I will need a lot of help proposing changes to the text, and honestly, I prefer it if other people do it.
What I can say is that any clarification in this direction will need to go much beyond the ubiquitous:

Thread and Interrupt Safety
This routine is thread-safe. This means that this routine may be safely used by multiple threads without the need for any user-provided thread locks. However, the routine is not interrupt safe. Typically, this is due to the use of memory allocation routines such as malloc or other non-MPICH runtime routines that are themselves not interrupt-safe.

The reason is that this only says that functions can be called from different threads, but it does say anything about calls to the same (or different) function that share, for example, MPI comm handle arguments.

Impact on Implementations

Certain aspect of the implementation will have be agreed upon and, if not, explicitly stated whether mutation may happen internally.
In other other words internal (unsynchronized) mutation will became part of the documented interface (even if the language doesn't provide a mechanism for that, i.e. in fortran or C).

My assumption is that implementations are already optimal in this aspect; if they need to mutate internal state to do operations then there are already good reasons for that.
Implementations can help by stating this mutation in their documentation/notes.

Impact on Users

People using C++ interfaces will be empowered, and programs will be safer because const (or the lack of it) will accurately reflect the nature of MPI communication and the fundamental algorithms and efficiency trade-offs.

References and Pull Requests

There is a lot of material to discuss this, something to start with:

Here is my reference implementation of a (header-ony)_ C++ MPI interface:
https://github.com/llnl/b-mpi3
Therein some discussion about Thread Safety:
https://github.com/llnl/b-mpi3?tab=readme-ov-file#thread-safety
and communicator duplication in particular:
https://github.com/llnl/b-mpi3?tab=readme-ov-file#duplication-of-communicator
Herb Sutter, : "C++ and Beyond 2012: Herb Sutter - You don't know const and mutable"
https://web.archive.org/web/20170119232617/https://channel9.msdn.com/posts/C-and-Beyond-2012-Herb-Sutter-You-dont-know-blank-and-blank
https://web.archive.org/web/20160924183715/https://channel9.msdn.com/Shows/Going+Deep/C-and-Beyond-2012-Herb-Sutter-Concurrency-and-Parallelism
Geoffrey Romer “What do you mean "thread-safe"?” https://www.youtube.com/watch?v=s5PCh_FaMfM

The text was updated successfully, but these errors were encountered:

jeffhammond · 2025-03-27T21:23:03Z

I disagree with all of this. MPI handles are not objects. They are object handles. The handles are const. The hidden state of the object itself is not relevant.

correaa · 2025-03-27T21:30:49Z

I don't want to derail the discussion, but at no point I said that handle is an object. If I have to characterize a handle, I would say, with caveats, that is closer to a pointer to the (interesting) object. The hidden state of the object is relevant because it tells you what you can do with it.

I understand that historically the "handle" is what called "the communicator", this alone creates confusion in this discussion.
(There is a literal language barrier here!).

At the end these are all definitions, if there is a concrete effect in the proposed interface, this is what we should focus on.

devreal · 2025-03-27T21:50:05Z

I think what is missing from the write-up is a clear motivation about why we should care about whether the internal (non-observable) state of objects with user-managed handles changes or not. I have a vague idea about multi-threading semantics potentially playing a role here but I am neither sold on being overly restrictive nor sure that I fully grasp the problem you're getting at. From a user perspective, these handles are const (they won't change) and I don't care about implementation details as long as I get correct results based on correct usage of the API.

jeffhammond · 2025-03-27T22:00:47Z

Explain to me your argument with immediate send changing the state of a communicator when that communicator is MPI_COMM_WORLD, which is a literal constant value in both the MPICH and MPI-5 ABIs. Tell me what about the literal value that isn't pointing to anything has its state mutated by isend.

jeffhammond · 2025-03-27T22:03:13Z

My assertion is that you incorrectly conflate internal state change in the MPI library in the global message queue (or whatever you want to call it) with state change in object handles. As no MPI implementation I know of has per-communicator message queues, it's likely that you are wrong in both practice and theory when it comes to isend mutating a communicator.

correaa · 2025-03-27T23:11:49Z

@devreal Fair enough, I will try to improve the motivation as we continue the conversation. Ultimately it boils down to what a C++ interface will look like, which is a very concrete product of this discussion. The answer to "why we should care"?, seems to depend on the definition of "we". This discussion comes from a subgroup of this forum that seemed to care about this question.

@jeffhammond MPI_COMM_WORLD is a literal constant, yes, the argument is that MPI_COMM_WORLD is not the communicator, is a handle to the communicator. (sorry if this was not what you asked about). The communicator has state even if this constant doesn't change. The question is about the mutation of this thing that the handle is handling. We seem to be working at different levels of indirection, which is fine, but the problem remains at the bottom.

Yes, it is possible that the example I made isend is incorrect, in theory and in practice, for the reason you state (the queue is not container in the communicator).
But at the same time, I am not sure if other internal state is not changing during the creation of the request which is a separate question.
Besides, and this is the point I wanted to make in a clumsy way: the subsequent behavior of a communicator is not the same if the isend is issued or not, so even in that sense the state of the communicator is not the same before and after; and for that reason alone it shouldn't be considered const.
There are other implications, that are curious, if the queue is outside the communicator, then there is something being, perhaps global, mutated (I guess in a synchronized way).
Overall, I agree it that it is likely a bad example case because it has many subtle issues.
The question remains, can/should isend be const or not?

(The good thing is that we are thinking about the true internal state of the communicator, which is what this discussion is really about)

At the end the day, I care because this discussion will answer these very important questions.
And it is not a matter of choice, it is a matter on all agreeing on the semantics/implementation/specification of MPI.

Even if I am incorrect in the analysis, there is a very concrete questions here below in the code.
If we agree in an answer at the end of the day, that would be great.
If we don't agree, or nobody cares, fine too, there could be different C++ interfaces that simply interpret the standard different and work well or badly under certain circumstances and implementations.
Or no relevant C++ interface at all.

class mpi3::communicator {
   communicator(communicator const&);   // should exist?
   communicator(communicator&);  // should exist?

    auto send(...) const???;  // should/can be const?
    auto immediate_send(...) const???;  // shoud/can be const?

    auto duplicate() const???;  // should be const?
}

jeffhammond · 2025-03-28T05:52:48Z

The nature of the internal state of MPI is invisible to you and the C++ bindings. It has no effect on const, any more than the firmware version running on the NIC.

Pretend MPI is implemented in hardware. Every MPI handle is just a handle to a structure in an ASIC that lives outside of the host process address space. Design your MPI bindings for that and they will be correct.

correaa · 2025-03-28T06:14:26Z

Thank you, @jeffhammond, for the guideline about pretending MPI is implemented in hardware.
I am learning a lot from this discussion.

Does this guideline answer the question of what should be const and what should not be const?

For example, should .duplicate() be const? How can I apply this idea of pretending that MPI is implemented in hardware to this more specific question?

...or in more practical terms (independent of C++ ideas), can I call MPI_Comm_dup(comm1, &commA) and MPI_Comm_dup(comm1, &commB) (same first argument) from different threads at the same time?

Given my knowledge level, I don't see yet how your guideline can answer this question.

jprotze · 2025-03-28T07:41:46Z

The answer for your last question only depends on the MPI threading level. If you initialized with thread-multiple, you can make these calls concurrently. With a lower threading level, you can only call a small subset of MPI functions concurrently.

I think I would start with a mental model of seeing MPI opaque handles as a const reference to an object with some members declared mutable.

Based on my observation that use of const decorated functions modifying mutable members is common practice in C++ (e.g. having mutable mutex members), I don't understand the statement in your initial post about the meaning of const in C++. I tend to interpret const in C++ as the function has no caller-visible side effects to the object.

jeffhammond · 2025-03-28T08:32:51Z

I consulted a coworker who is very active in WG21, who said that const is almost always pointless and the only reason he adds it to API declarations is to avoid wasting time arguing with people who think it matters.

correaa · 2025-03-28T16:25:02Z

The answer for your last question only depends on the MPI threading level. If you initialized with thread-multiple, you can make these calls concurrently. With a lower threading level, you can only call a small subset of MPI functions concurrently.

Yes, thank you. That makes sense, but there are two problems with that:

The threading level is runtime, so we can only choose it at runtime.

We can invert the logic to make subtly different versions of C++ bindings that assume (at compile time) and demand (at run time) a certain threading level. The different versions will be mutually incompatible in the sense that one version will have const in different places.
Making a C++ binding that assumes the highest level of threading is a bit of a trap for a C++ binding because it will pay the price associated with this for all applications.

I think I would start with a mental model of seeing MPI opaque handles as a const reference to an object with some members declared mutable.

Ok, but effectively what "members are declared mutable" depends a lot on the threading-level chosen (or obtained) during initialization.

Based on my observation that use of const decorated functions modifying mutable members is common practice in C++ (e.g. having mutable mutex members),

I agree with the last observation; the only things that can be "honestly" mutable are mutexes and things locked by mutexes.
I have the intuition that at the highest threading level, each each communicator is effectively a mutex in itself, and based on that, it is okay to make a mutable communicator when the communicator is a member.

I don't understand the statement in your initial post about the meaning of const in C++. I tend to interpret const in C++ as the function has no caller-visible side effects to the object.

And that is a correct. My statement is that it extends also to include visible side effects from other threads, not just the caller (not just the called in the same thread).

correaa · 2025-03-28T16:35:38Z

I consulted a coworker who is very active in WG21, who said that const is almost always pointless and the only reason he adds it to API declarations is to avoid wasting time arguing with people who think it matters.

That is the most precise piece of advice to resolve this issue.

Accuracy is a different matter, and I am afraid this person is pulling someone's leg or exaggerating.
I bet your coworker is not writing copy-constructors, assignments, or equalities (==) that take arguments that are not const&.
(or he/she might never write this type of special member functions at all for other reasons)

Even to please others and stop arguments, you must know where to put const so as not to break everything.
One thing I probably agree with your coworker, that is that I prefer an "honest" non-const argument to a const argument that "lies" about internal mutation.

correaa changed the title ~~Internal Mutation of communicator or other MPI objects in relation to C++ const semantics~~ Internal mutation of communicator or other MPI objects in relation to C++ const semantics Mar 27, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Internal mutation of communicator or other MPI objects in relation to C++ const semantics #980

Internal mutation of communicator or other MPI objects in relation to C++ const semantics #980

correaa commented Mar 27, 2025 •

edited

Loading

jeffhammond commented Mar 27, 2025

correaa commented Mar 27, 2025 •

edited

Loading

devreal commented Mar 27, 2025 •

edited

Loading

jeffhammond commented Mar 27, 2025

jeffhammond commented Mar 27, 2025 •

edited

Loading

correaa commented Mar 27, 2025 •

edited

Loading

jeffhammond commented Mar 28, 2025

correaa commented Mar 28, 2025 •

edited

Loading

jprotze commented Mar 28, 2025

jeffhammond commented Mar 28, 2025

correaa commented Mar 28, 2025

correaa commented Mar 28, 2025 •

edited

Loading

Internal mutation of communicator or other MPI objects in relation to C++ const semantics #980

Internal mutation of communicator or other MPI objects in relation to C++ const semantics #980

Comments

correaa commented Mar 27, 2025 • edited Loading

Problem

Proposal

Changes to the Text

Impact on Implementations

Impact on Users

References and Pull Requests

jeffhammond commented Mar 27, 2025

correaa commented Mar 27, 2025 • edited Loading

devreal commented Mar 27, 2025 • edited Loading

jeffhammond commented Mar 27, 2025

jeffhammond commented Mar 27, 2025 • edited Loading

correaa commented Mar 27, 2025 • edited Loading

jeffhammond commented Mar 28, 2025

correaa commented Mar 28, 2025 • edited Loading

jprotze commented Mar 28, 2025

jeffhammond commented Mar 28, 2025

correaa commented Mar 28, 2025

correaa commented Mar 28, 2025 • edited Loading

correaa commented Mar 27, 2025 •

edited

Loading

correaa commented Mar 27, 2025 •

edited

Loading

devreal commented Mar 27, 2025 •

edited

Loading

jeffhammond commented Mar 27, 2025 •

edited

Loading

correaa commented Mar 27, 2025 •

edited

Loading

correaa commented Mar 28, 2025 •

edited

Loading

correaa commented Mar 28, 2025 •

edited

Loading