Skip to content

Updated the documentation about shared memory. #13218

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
78 changes: 0 additions & 78 deletions docs/launching-apps/localhost.rst
Original file line number Diff line number Diff line change
Expand Up @@ -35,81 +35,3 @@ peers:
and destination MPI processes can share memory (e.g., via SYSV or
POSIX shared memory mechanisms).

Shared memory MPI communication
-------------------------------

.. error:: TODO This should really be moved to the networking section.

The ``sm`` BTL supports two modes of shared memory communication:

#. **Two-copy:** Otherwise known as "copy-in / copy-out", this mode is
where the sender copies data into shared memory and the receiver
copies the data out.

This mechanism is always available.

#. **Single copy:** In this mode, the sender or receiver makes a
single copy of the message data from the source buffer in one
process to the destination buffer in another process. Open MPI
supports three flavors of shared memory single-copy transfers:

* `Linux KNEM <https://knem.gitlabpages.inria.fr/>`_. This is a
standalone Linux kernel module, made specifically for HPC and MPI
libraries to enable high-performance single-copy message
transfers.

Open MPI must be able to find the KNEM header files in order to
build support for KNEM.

* `Linux XPMEM <https://github.com/hjelmn/xpmem>`_. Similar to
KNEM, this is a standalone Linux kernel module, made specifically
for HPC and MPI libraries to enable high-performance single-copy
message transfers. It is derived from the Cray XPMEM system.

Open MPI must be able to find the XPMEM header files in order to
build support for XPMEM.

* Linux Cross-Memory Attach (CMA). This mechanism is built-in to
modern versions of the Linux kernel. Although more performance
than the two-copy shared memory transfer mechanism, CMA is the
lowest performance of the single-copy mechanisms. However, CMA
is likely the most widely available because it is enabled by
default in several modern Linux distributions.

Open MPI must be built on a Linux system with a recent enough
Glibc and kernel version in order to build support for Linux CMA.

Which mechanism is used at run time depends both on how Open MPI was
built and how your system is configured. You can check to see which
single-copy mechanisms Open MPI was built with via two mechanisms:

#. At the end of running ``configure``, Open MPI emits a list of
transports for which it found relevant header files and libraries
such that it will be able to build support for them. You might see
lines like this, for example:

.. code-block:: text

Shared memory/copy in+copy out: yes
Shared memory/Linux CMA: yes
Shared memory/Linux KNEM: no
Shared memory/XPMEM: no

The above output indicates that Open MPI will be built with 2-copy
(as mentioned above, 2-copy is *always* available) and with Linux
CMA support. KNEM and XPMEM support will *not* be built.

#. After Open MPI is installed, the ``ompi_info`` command can show
which ``smsc`` (shared memory single copy) components are
available:

.. code-block:: text

shell$ ompi_info | grep smsc
MCA smsc: cma (MCA v2.1.0, API v1.0.0, Component v5.1.0)

This Open MPI installation only supports the Linux CMA single-copy
mechanism.

.. note:: As implied by the SMSC component names, none of them are
supported on macOS. macOS users will use the two-copy mechanism.
122 changes: 103 additions & 19 deletions docs/tuning-apps/networking/shared-memory.rst
Original file line number Diff line number Diff line change
@@ -1,11 +1,9 @@
Shared Memory
=============

.. error:: TODO This section needs to be converted from FAQ Q&A style
to regular documentation style.

What is the sm BTL?
-------------------
The sm BTL
----------

The ``sm`` BTL is a low-latency, high-bandwidth mechanism for
transferring data between two processes via shared memory. This BTL
Expand All @@ -26,8 +24,8 @@ can only be used between processes executing on the same node.

/////////////////////////////////////////////////////////////////////////

How do I specify use of sm for MPI messages?
--------------------------------------------
Specifying the Use of sm for MPI Messages
-----------------------------------------

Typically, it is unnecessary to do so; OMPI will use the best BTL available
for each communication.
Expand All @@ -44,8 +42,8 @@ communications. For example:
/////////////////////////////////////////////////////////////////////////

How can I tune these parameters to improve performance?
-------------------------------------------------------
Tuning Parameters to Improve Performance
----------------------------------------

Mostly, the default values of the MCA parameters have already
been chosen to give good performance. To improve performance further
Expand All @@ -71,21 +69,107 @@ performance for memory.
to resource congestion, but you can increase this parameter to
pre-reserve space for more fragments.

* ``btl_sm_backing_directory``: Directory to place backing files for
shared memory communication. This directory should be on a local
filesystem such as /tmp or /dev/shm (default: (linux) /dev/shm,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
filesystem such as /tmp or /dev/shm (default: (linux) /dev/shm,
filesystem such as ``/tmp`` or ``/dev/shm`` (default: (linux) ``/dev/shm``,

(others) session directory)

/////////////////////////////////////////////////////////////////////////

Where is the shared memory mapped on the filesystem?
Shared Memory Mechanisms
------------------------

The ``sm`` BTL supports two modes of shared memory communication:

#. **Two-copy:** Otherwise known as "copy-in / copy-out", this mode is
where the sender copies data into shared memory and the receiver
copies the data out.

This mechanism is always available.

#. **Single copy:** In this mode, the sender or receiver makes a
single copy of the message data from the source buffer in one
process to the destination buffer in another process. Open MPI
supports three flavors of shared memory single-copy transfers:

* `Linux KNEM <https://knem.gitlabpages.inria.fr/>`_. This is a
standalone Linux kernel module, made specifically for HPC and MPI
libraries to enable high-performance single-copy message
transfers.

Open MPI must be able to find the KNEM header files in order to
build support for KNEM.

* `Linux XPMEM <https://github.com/hjelmn/xpmem>`_. Similar to
KNEM, this is a standalone Linux kernel module, made specifically
for HPC and MPI libraries to enable high-performance single-copy
message transfers. It is derived from the Cray XPMEM system.

Open MPI must be able to find the XPMEM header files in order to
build support for XPMEM.

* Linux Cross-Memory Attach (CMA). This mechanism is built-in to
modern versions of the Linux kernel. Although more performance
than the two-copy shared memory transfer mechanism, CMA is the
lowest performance of the single-copy mechanisms. However, CMA
is likely the most widely available because it is enabled by
default in several modern Linux distributions.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question: do we need to add in here something about the smsc/accelerator component that will be available in 6.0?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just moved the explanation about Shared Memory Mechanisms from docs/launching-apps/localhost.rst. I don't understand the smsc/accelerator component implementation. Sorry about this.

Open MPI must be built on a Linux system with a recent enough
Glibc and kernel version in order to build support for Linux CMA.

Which mechanism is used at run time depends both on how Open MPI was
built and how your system is configured. You can check to see which
single-copy mechanisms Open MPI was built with via two mechanisms:

#. At the end of running ``configure``, Open MPI emits a list of
transports for which it found relevant header files and libraries
such that it will be able to build support for them. You might see
lines like this, for example:

.. code-block:: text
Shared memory/copy in+copy out: yes
Shared memory/Linux CMA: yes
Shared memory/Linux KNEM: no
Shared memory/XPMEM: no
The above output indicates that Open MPI will be built with 2-copy
(as mentioned above, 2-copy is *always* available) and with Linux
CMA support. KNEM and XPMEM support will *not* be built.

#. After Open MPI is installed, the ``ompi_info`` command can show
which ``smsc`` (shared memory single copy) components are
available:

.. code-block:: text
shell$ ompi_info | grep smsc
MCA smsc: cma (MCA v2.1.0, API v1.0.0, Component v5.1.0)
This Open MPI installation only supports the Linux CMA single-copy
mechanism.

.. note:: As implied by the SMSC component names, none of them are
supported on macOS. macOS users will use the two-copy mechanism.

/////////////////////////////////////////////////////////////////////////

.. error:: TODO Is this correct?
Shared Memory Mapping on the Filesystem
---------------------------------------

The file will be in the OMPI session directory, which is typically
something like ``/tmp/openmpi-sessions-USERNAME@HOSTNAME/*``.
The file itself will have the name
``shared_mem_pool.HOSTNAME``. For example, the full path could be
``/tmp/openmpi-sessions-johndoe@node0_0/1543/1/shared_mem_pool.node0``.
The default location of the file is in the ``/dev/shm directory``. If ``/dev/shm``
does not exist on the system, the default location will be the OMPI session
directory. The path is typically something like:
``/dev/shm/sm_segment.nodename.user_id.job_id.my_node_rank``.
For example, the full path could be: ``/dev/shm/sm_segment.x.1000.23c70000.0``.

.. error:: TODO The filename above will certainly be wrong.
You can use the MCA parameter ``btl_sm_backing_directory`` to place the session
directory in a non-default location.

To place the session directory in a non-default location, use the MCA parameter
``orte_tmpdir_base``.
.. note:: The session directory is defined in PMIx. You can
use ``--pmixmca orte_tmpdir_base "/path/to/somewhere"`` to place the session
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this the right MCA param name? I didn't think we had any orte_ primary names any more -- is that an alias?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The correct name is "prte_tmpdir_base". In the OMPI schizo component, you do check each generic MCA name for "orte" and then translate that to "prte", so it would work if you said --mca orte_tmpdir_base foo.

Note, however, that it is not a PMIx param - passing --pmixmca orte_tmpdir_base foo or --pmixmca prte_tmpdir_base foo will be ignored. It is a PRRTE param, and so the correct entry would be --prtemca prte_tmpdir_base foo

There are no aliases for that param.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I made a mistake. Thanks for pointing this out.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No worries - you've done an excellent job!

directory in a non-default location.

.. error:: TODO The MCA param name above is definitely wrong.
.. note:: Even when using single-copy methods like CMA, a shared memory file is still
created for managing connection metadata.