-
Notifications
You must be signed in to change notification settings - Fork 897
Updated the documentation about shared memory. #13218
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,11 +1,9 @@ | ||
Shared Memory | ||
============= | ||
|
||
.. error:: TODO This section needs to be converted from FAQ Q&A style | ||
to regular documentation style. | ||
|
||
What is the sm BTL? | ||
------------------- | ||
The sm BTL | ||
---------- | ||
|
||
The ``sm`` BTL is a low-latency, high-bandwidth mechanism for | ||
transferring data between two processes via shared memory. This BTL | ||
|
@@ -26,8 +24,8 @@ can only be used between processes executing on the same node. | |
|
||
///////////////////////////////////////////////////////////////////////// | ||
|
||
How do I specify use of sm for MPI messages? | ||
-------------------------------------------- | ||
Specifying the Use of sm for MPI Messages | ||
----------------------------------------- | ||
|
||
Typically, it is unnecessary to do so; OMPI will use the best BTL available | ||
for each communication. | ||
|
@@ -44,8 +42,8 @@ communications. For example: | |
///////////////////////////////////////////////////////////////////////// | ||
|
||
How can I tune these parameters to improve performance? | ||
------------------------------------------------------- | ||
Tuning Parameters to Improve Performance | ||
---------------------------------------- | ||
|
||
Mostly, the default values of the MCA parameters have already | ||
been chosen to give good performance. To improve performance further | ||
|
@@ -71,21 +69,107 @@ performance for memory. | |
to resource congestion, but you can increase this parameter to | ||
pre-reserve space for more fragments. | ||
|
||
* ``btl_sm_backing_directory``: Directory to place backing files for | ||
shared memory communication. This directory should be on a local | ||
filesystem such as /tmp or /dev/shm (default: (linux) /dev/shm, | ||
(others) session directory) | ||
|
||
///////////////////////////////////////////////////////////////////////// | ||
|
||
Where is the shared memory mapped on the filesystem? | ||
Shared Memory Mechanisms | ||
------------------------ | ||
|
||
The ``sm`` BTL supports two modes of shared memory communication: | ||
|
||
#. **Two-copy:** Otherwise known as "copy-in / copy-out", this mode is | ||
where the sender copies data into shared memory and the receiver | ||
copies the data out. | ||
|
||
This mechanism is always available. | ||
|
||
#. **Single copy:** In this mode, the sender or receiver makes a | ||
single copy of the message data from the source buffer in one | ||
process to the destination buffer in another process. Open MPI | ||
supports three flavors of shared memory single-copy transfers: | ||
|
||
* `Linux KNEM <https://knem.gitlabpages.inria.fr/>`_. This is a | ||
standalone Linux kernel module, made specifically for HPC and MPI | ||
libraries to enable high-performance single-copy message | ||
transfers. | ||
|
||
Open MPI must be able to find the KNEM header files in order to | ||
build support for KNEM. | ||
|
||
* `Linux XPMEM <https://github.com/hjelmn/xpmem>`_. Similar to | ||
KNEM, this is a standalone Linux kernel module, made specifically | ||
for HPC and MPI libraries to enable high-performance single-copy | ||
message transfers. It is derived from the Cray XPMEM system. | ||
|
||
Open MPI must be able to find the XPMEM header files in order to | ||
build support for XPMEM. | ||
|
||
* Linux Cross-Memory Attach (CMA). This mechanism is built-in to | ||
modern versions of the Linux kernel. Although more performance | ||
than the two-copy shared memory transfer mechanism, CMA is the | ||
lowest performance of the single-copy mechanisms. However, CMA | ||
is likely the most widely available because it is enabled by | ||
default in several modern Linux distributions. | ||
|
||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Question: do we need to add in here something about the smsc/accelerator component that will be available in 6.0? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I just moved the explanation about Shared Memory Mechanisms from |
||
Open MPI must be built on a Linux system with a recent enough | ||
Glibc and kernel version in order to build support for Linux CMA. | ||
|
||
Which mechanism is used at run time depends both on how Open MPI was | ||
built and how your system is configured. You can check to see which | ||
single-copy mechanisms Open MPI was built with via two mechanisms: | ||
|
||
#. At the end of running ``configure``, Open MPI emits a list of | ||
transports for which it found relevant header files and libraries | ||
such that it will be able to build support for them. You might see | ||
lines like this, for example: | ||
|
||
.. code-block:: text | ||
Shared memory/copy in+copy out: yes | ||
Shared memory/Linux CMA: yes | ||
Shared memory/Linux KNEM: no | ||
Shared memory/XPMEM: no | ||
The above output indicates that Open MPI will be built with 2-copy | ||
(as mentioned above, 2-copy is *always* available) and with Linux | ||
CMA support. KNEM and XPMEM support will *not* be built. | ||
|
||
#. After Open MPI is installed, the ``ompi_info`` command can show | ||
which ``smsc`` (shared memory single copy) components are | ||
available: | ||
|
||
.. code-block:: text | ||
shell$ ompi_info | grep smsc | ||
MCA smsc: cma (MCA v2.1.0, API v1.0.0, Component v5.1.0) | ||
This Open MPI installation only supports the Linux CMA single-copy | ||
mechanism. | ||
|
||
.. note:: As implied by the SMSC component names, none of them are | ||
supported on macOS. macOS users will use the two-copy mechanism. | ||
|
||
///////////////////////////////////////////////////////////////////////// | ||
|
||
.. error:: TODO Is this correct? | ||
Shared Memory Mapping on the Filesystem | ||
--------------------------------------- | ||
|
||
The file will be in the OMPI session directory, which is typically | ||
something like ``/tmp/openmpi-sessions-USERNAME@HOSTNAME/*``. | ||
The file itself will have the name | ||
``shared_mem_pool.HOSTNAME``. For example, the full path could be | ||
``/tmp/openmpi-sessions-johndoe@node0_0/1543/1/shared_mem_pool.node0``. | ||
The default location of the file is in the ``/dev/shm directory``. If ``/dev/shm`` | ||
does not exist on the system, the default location will be the OMPI session | ||
directory. The path is typically something like: | ||
``/dev/shm/sm_segment.nodename.user_id.job_id.my_node_rank``. | ||
For example, the full path could be: ``/dev/shm/sm_segment.x.1000.23c70000.0``. | ||
|
||
.. error:: TODO The filename above will certainly be wrong. | ||
You can use the MCA parameter ``btl_sm_backing_directory`` to place the session | ||
directory in a non-default location. | ||
|
||
To place the session directory in a non-default location, use the MCA parameter | ||
``orte_tmpdir_base``. | ||
.. note:: The session directory is defined in PMIx. You can | ||
use ``--pmixmca orte_tmpdir_base "/path/to/somewhere"`` to place the session | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Is this the right MCA param name? I didn't think we had any There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The correct name is "prte_tmpdir_base". In the OMPI schizo component, you do check each generic MCA name for "orte" and then translate that to "prte", so it would work if you said Note, however, that it is not a PMIx param - passing There are no aliases for that param. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Sorry, I made a mistake. Thanks for pointing this out. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. No worries - you've done an excellent job! |
||
directory in a non-default location. | ||
|
||
.. error:: TODO The MCA param name above is definitely wrong. | ||
.. note:: Even when using single-copy methods like CMA, a shared memory file is still | ||
created for managing connection metadata. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.