Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add documentation for SequentialHostInit #607

Merged
merged 5 commits into from
Dec 2, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions docs/source/API/alphabetical.rst
Original file line number Diff line number Diff line change
Expand Up @@ -200,6 +200,8 @@ Core
+--------------------------------------------------------------------------------------+---------------------------+------------------------------------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------+
| `Serial <core/execution_spaces.html#kokkos-serial>`_ | `Core <core-index.html>`_ | `Spaces <core/Spaces.html>`_ | Execution space using serial execution the CPU. |
+--------------------------------------------------------------------------------------+---------------------------+------------------------------------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------+
| `SequentialHostInit <core/view/view_alloc.html>`_ | `Core <core-index.html>`_ | `View and related <core/View.html>`_ | An option used with `view_alloc <core/view/view_alloc.html>`_ |
+--------------------------------------------------------------------------------------+---------------------------+------------------------------------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------+
| `ScopeGuard <core/initialize_finalize/ScopeGuard.html>`_ | `Core <core-index.html>`_ | `Initialization and Finalization <core/Initialize-and-Finalize.html>`_ | class to aggregate initializing and finalizing Kokkos |
+--------------------------------------------------------------------------------------+---------------------------+------------------------------------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------+
| `SpaceAccessibility <core/SpaceAccessibility.html>`_ | `Core <core-index.html>`_ | `Spaces <core/Spaces.html>`_ | Facility to query accessibility rules between execution and memory spaces. |
Expand Down Expand Up @@ -232,3 +234,5 @@ Core
+--------------------------------------------------------------------------------------+---------------------------+------------------------------------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------+
| `View-like Type Concept <core/view/view_like.html>`_ | `Core <core-index.html>`_ | `View and related <core/View.html>`_ | A set of class templates that act like a View |
+--------------------------------------------------------------------------------------+---------------------------+------------------------------------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------+
| `WithoutInitializing <core/view/view_alloc.html>`_ | `Core <core-index.html>`_ | `View and related <core/View.html>`_ | An option used with `view_alloc <core/view/view_alloc.html>`_ |
+--------------------------------------------------------------------------------------+---------------------------+------------------------------------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------+
30 changes: 29 additions & 1 deletion docs/source/API/core/view/view_alloc.rst
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,9 @@ Create View allocation parameter bundle from argument list. Valid argument list

* execution space instance able to access ``View::memory_space``

* ``Kokkos::WithoutInitializing`` to bypass initialization
* ``Kokkos::WithoutInitializing`` to bypass element initialization and destruction

* ``Kokkos::SequentialHostInit`` to perform element initialization and destruction serially on host (since 4.4.01)

* ``Kokkos::AllowPadding`` to allow allocation to pad dimensions for memory alignment

Expand All @@ -44,8 +46,34 @@ Description

``args`` : Can only be a pointer to memory.


.. cppkokkos:type:: ALLOC_PROP

:cppkokkos:type:`ALLOC_PROP` is a special, unspellable implementation-defined type that is returned by :cppkokkos:func:`view_alloc`
and :cppkokkos:func:`view_wrap`. It represents a bundle of allocator parameters, including the View label, the memory space instance,
the execution space instance, whether to initialize the memory, whether to allow padding, and the raw pointer value (for wrapped unmanaged views).

.. cppkokkos:type:: WithoutInitializing

:cppkokkos:type:`WithoutInitializing` is intended to be used in situations where default construction of `View` elements in its
associated execution space is not needed or not viable. In particular, it may not be viable in situations such as the construction of objects with virtual functions,
or for `Views` of elements without default constructor. In such situations, this option is often used in conjunction with manual in-place `new`
construction of objects and manual destruction of elements.

.. cppkokkos:type:: SequentialHostInit
tpadioleau marked this conversation as resolved.
Show resolved Hide resolved

:cppkokkos:type:`SequentialHostInit` is intended to be used to initialize elements that do not have a default constructor or destructor that
can be called inside a Kokkos parallel region. In particular this includes constructors and destructors which:

* allocate or deallocate memory
* create or destroy managed `Kokkos::View` objects
* call Kokkos parallel operations

When using this allocation option the `View` constructor/destructor will create/destroy elements in a serial loop on the Host.
tpadioleau marked this conversation as resolved.
Show resolved Hide resolved

.. warning::

`SequentialHostInit` can only be used when creating host accessible `View`s, such as `View`s with `HostSpace`, `SharedSpace`,
or `SharedHostPinnedSpace` as memory space.
Comment on lines +76 to +77
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should the DualView case be mentioned here too?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that is part of a larger paragraph including realloc and resize I would want to do that in a second pass.


.. versionadded:: 4.4.01
74 changes: 59 additions & 15 deletions docs/source/ProgrammingGuide/View.rst
Original file line number Diff line number Diff line change
Expand Up @@ -145,7 +145,57 @@ Another issue is that View construction in a Kokkos parallel region does not upd

Here is how to create a View of Views, where each inner View has a separate owning allocation:

1. The outer View must have a memory space that is both host and device accessible, such as `CudaUVMSpace`.
1. The outer View must have a memory space that is both host and device accessible, such as :cppkokkos:type:`SharedSpace`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should the possibility of using a DualView for the outer view be mentioned as an alternative?

2. Create the outer View using the :cppkokkos:type:`SequentialHostInit` property.
3. Create inner Views in a sequential host loop. (Prefer creating the inner Views uninitialized. Creating the inner Views initialized launches one device kernel per inner View. This is likely much slower than just initializing them all yourself from a single kernel over the outer View.)
4. At this point, you may access the outer and inner Views on device.
5. Get rid of the outer View as you normally would.

Here is an example:

.. code-block:: c++

using Kokkos::SharedSapce;
using Kokkos::View;
using Kokkos::view_alloc;
using Kokkos::SequentialHostInit;
using Kokkos::WithoutInitializing;

using inner_view_type = View<double*>;
using outer_view_type = View<inner_view_type*, SharedSpace>;

const int numOuter = 5;
const int numInner = 4;
outer_view_type outer (view_alloc (std::string ("Outer"), SequentialHostInit), numOuter);

// Create inner Views on host, outside of a parallel region, uninitialized
for (int k = 0; k < numOuter; ++k) {
const std::string label = std::string ("Inner ") + std::to_string (k);
outer(k) = inner_view_type (view_alloc (label, WithoutInitializing), numInner);
}

// Outer and inner views are now ready for use on device

Kokkos::RangePolicy<> range (0, numOuter);
Kokkos::parallel_for ("my kernel label", range,
KOKKOS_LAMBDA (const int i) {
for (int j = 0; j < numInner; ++j) {
device_outer(i)(j) = 10.0 * double (i) + double (j);
}
}
});
Kokkos::fence();

// Destroy the View of Views - this will call destructors sequentially on the host!
outer = outer_view_type ();

Another approach is to create the inner Views as nonowning, from a single pool of memory. This makes it unnecessary to invoke their destructors.

.. warning::

`SequentialHostInit` was added in version 4.4.01. Prior to that the process was more involved.

1. The outer View must have a memory space that is both host and device accessible, such as `SharedSpace`.
2. Create the outer View without initializing it.
3. Create inner Views using placement new, in a sequential host loop. (Prefer creating the inner Views uninitialized. Creating the inner Views initialized launches one device kernel per inner View. This is likely much slower than just initializing them all yourself from a single kernel over the outer View.)
4. At this point, you may access the outer and inner Views on device.
Expand All @@ -157,15 +207,13 @@ Here is an example:

.. code-block:: c++

using Kokkos::Cuda;
using Kokkos::CudaSpace;
using Kokkos::CudaUVMSpace;
using Kokkos::SharedSpace;
using Kokkos::View;
using Kokkos::view_alloc;
using Kokkos::WithoutInitializing;

using inner_view_type = View<double*, CudaSpace>;
using outer_view_type = View<inner_view_type*, CudaUVMSpace>;
using inner_view_type = View<double*>;
using outer_view_type = View<inner_view_type*, SharedSpace>;

const int numOuter = 5;
const int numInner = 4;
Expand All @@ -174,36 +222,32 @@ Here is an example:
// Create inner Views on host, outside of a parallel region, uninitialized
for (int k = 0; k < numOuter; ++k) {
const std::string label = std::string ("Inner ") + std::to_string (k);
new (&outer[k]) inner_view_type (view_alloc (label, WithoutInitializing), numInner);
new (&outer(k)) inner_view_type (view_alloc (label, WithoutInitializing), numInner);
}

// Outer and inner views are now ready for use on device

Kokkos::RangePolicy<Cuda, int> range (0, numOuter);
Kokkos::RangePolicy<> range (0, numOuter);
Kokkos::parallel_for ("my kernel label", range,
KOKKOS_LAMBDA (const int i) {
for (int j = 0; j < numInner; ++j) {
device_outer[i][j] = 10.0 * double (i) + double (j);
device_outer(i)(j) = 10.0 * double (i) + double (j);
}
}
});

// Fence before deallocation on host, to make sure
// that the device kernel is done first.
// Note the new fence syntax that requires an instance.
// This will work with other CUDA streams, etc.
Cuda ().fence ();
Kokkos::fence ();

// Destroy inner Views, again on host, outside of a parallel region.
for (int k = 0; k < 5; ++k) {
outer[k].~inner_view_type ();
outer(k).~inner_view_type ();
}

// You're better off disposing of outer immediately.
outer = outer_view_type ();

Another approach is to create the inner Views as nonowning, from a single pool of memory. This makes it unnecessary to invoke their destructors.

6.2.4 Const Views
~~~~~~~~~~~~~~~~~

Expand Down
Loading