diff --git a/docs/source/API/alphabetical.rst b/docs/source/API/alphabetical.rst index adf87b068..b335e8d9f 100644 --- a/docs/source/API/alphabetical.rst +++ b/docs/source/API/alphabetical.rst @@ -200,6 +200,8 @@ Core +--------------------------------------------------------------------------------------+---------------------------+------------------------------------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------+ | `Serial `_ | `Core `_ | `Spaces `_ | Execution space using serial execution the CPU. | +--------------------------------------------------------------------------------------+---------------------------+------------------------------------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------+ +| `SequentialHostInit `_ | `Core `_ | `View and related `_ | An option used with `view_alloc `_ | ++--------------------------------------------------------------------------------------+---------------------------+------------------------------------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------+ | `ScopeGuard `_ | `Core `_ | `Initialization and Finalization `_ | class to aggregate initializing and finalizing Kokkos | +--------------------------------------------------------------------------------------+---------------------------+------------------------------------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------+ | `SpaceAccessibility `_ | `Core `_ | `Spaces `_ | Facility to query accessibility rules between execution and memory spaces. | @@ -232,3 +234,5 @@ Core +--------------------------------------------------------------------------------------+---------------------------+------------------------------------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------+ | `View-like Type Concept `_ | `Core `_ | `View and related `_ | A set of class templates that act like a View | +--------------------------------------------------------------------------------------+---------------------------+------------------------------------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------+ +| `WithoutInitializing `_ | `Core `_ | `View and related `_ | An option used with `view_alloc `_ | ++--------------------------------------------------------------------------------------+---------------------------+------------------------------------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------+ diff --git a/docs/source/API/core/view/view_alloc.rst b/docs/source/API/core/view/view_alloc.rst index 8fe66138c..c99fabe87 100644 --- a/docs/source/API/core/view/view_alloc.rst +++ b/docs/source/API/core/view/view_alloc.rst @@ -22,7 +22,9 @@ Create View allocation parameter bundle from argument list. Valid argument list * execution space instance able to access ``View::memory_space`` -* ``Kokkos::WithoutInitializing`` to bypass initialization +* ``Kokkos::WithoutInitializing`` to bypass element initialization and destruction + +* ``Kokkos::SequentialHostInit`` to perform element initialization and destruction serially on host (since 4.4.01) * ``Kokkos::AllowPadding`` to allow allocation to pad dimensions for memory alignment @@ -44,8 +46,34 @@ Description ``args`` : Can only be a pointer to memory. + .. cppkokkos:type:: ALLOC_PROP :cppkokkos:type:`ALLOC_PROP` is a special, unspellable implementation-defined type that is returned by :cppkokkos:func:`view_alloc` and :cppkokkos:func:`view_wrap`. It represents a bundle of allocator parameters, including the View label, the memory space instance, the execution space instance, whether to initialize the memory, whether to allow padding, and the raw pointer value (for wrapped unmanaged views). + +.. cppkokkos:type:: WithoutInitializing + + :cppkokkos:type:`WithoutInitializing` is intended to be used in situations where default construction of `View` elements in its + associated execution space is not needed or not viable. In particular, it may not be viable in situations such as the construction of objects with virtual functions, + or for `Views` of elements without default constructor. In such situations, this option is often used in conjunction with manual in-place `new` + construction of objects and manual destruction of elements. + +.. cppkokkos:type:: SequentialHostInit + + :cppkokkos:type:`SequentialHostInit` is intended to be used to initialize elements that do not have a default constructor or destructor that + can be called inside a Kokkos parallel region. In particular this includes constructors and destructors which: + + * allocate or deallocate memory + * create or destroy managed `Kokkos::View` objects + * call Kokkos parallel operations + + When using this allocation option the `View` constructor/destructor will create/destroy elements in a serial loop on the Host. + + .. warning:: + + `SequentialHostInit` can only be used when creating host accessible `View`s, such as `View`s with `HostSpace`, `SharedSpace`, + or `SharedHostPinnedSpace` as memory space. + + .. versionadded:: 4.4.01 diff --git a/docs/source/ProgrammingGuide/View.rst b/docs/source/ProgrammingGuide/View.rst index ae6fa9661..9d5b7192e 100644 --- a/docs/source/ProgrammingGuide/View.rst +++ b/docs/source/ProgrammingGuide/View.rst @@ -145,7 +145,57 @@ Another issue is that View construction in a Kokkos parallel region does not upd Here is how to create a View of Views, where each inner View has a separate owning allocation: -1. The outer View must have a memory space that is both host and device accessible, such as `CudaUVMSpace`. +1. The outer View must have a memory space that is both host and device accessible, such as :cppkokkos:type:`SharedSpace`. +2. Create the outer View using the :cppkokkos:type:`SequentialHostInit` property. +3. Create inner Views in a sequential host loop. (Prefer creating the inner Views uninitialized. Creating the inner Views initialized launches one device kernel per inner View. This is likely much slower than just initializing them all yourself from a single kernel over the outer View.) +4. At this point, you may access the outer and inner Views on device. +5. Get rid of the outer View as you normally would. + +Here is an example: + +.. code-block:: c++ + + using Kokkos::SharedSapce; + using Kokkos::View; + using Kokkos::view_alloc; + using Kokkos::SequentialHostInit; + using Kokkos::WithoutInitializing; + + using inner_view_type = View; + using outer_view_type = View; + + const int numOuter = 5; + const int numInner = 4; + outer_view_type outer (view_alloc (std::string ("Outer"), SequentialHostInit), numOuter); + + // Create inner Views on host, outside of a parallel region, uninitialized + for (int k = 0; k < numOuter; ++k) { + const std::string label = std::string ("Inner ") + std::to_string (k); + outer(k) = inner_view_type (view_alloc (label, WithoutInitializing), numInner); + } + + // Outer and inner views are now ready for use on device + + Kokkos::RangePolicy<> range (0, numOuter); + Kokkos::parallel_for ("my kernel label", range, + KOKKOS_LAMBDA (const int i) { + for (int j = 0; j < numInner; ++j) { + device_outer(i)(j) = 10.0 * double (i) + double (j); + } + } + }); + Kokkos::fence(); + + // Destroy the View of Views - this will call destructors sequentially on the host! + outer = outer_view_type (); + +Another approach is to create the inner Views as nonowning, from a single pool of memory. This makes it unnecessary to invoke their destructors. + +.. warning:: + + `SequentialHostInit` was added in version 4.4.01. Prior to that the process was more involved. + +1. The outer View must have a memory space that is both host and device accessible, such as `SharedSpace`. 2. Create the outer View without initializing it. 3. Create inner Views using placement new, in a sequential host loop. (Prefer creating the inner Views uninitialized. Creating the inner Views initialized launches one device kernel per inner View. This is likely much slower than just initializing them all yourself from a single kernel over the outer View.) 4. At this point, you may access the outer and inner Views on device. @@ -157,15 +207,13 @@ Here is an example: .. code-block:: c++ - using Kokkos::Cuda; - using Kokkos::CudaSpace; - using Kokkos::CudaUVMSpace; + using Kokkos::SharedSpace; using Kokkos::View; using Kokkos::view_alloc; using Kokkos::WithoutInitializing; - using inner_view_type = View; - using outer_view_type = View; + using inner_view_type = View; + using outer_view_type = View; const int numOuter = 5; const int numInner = 4; @@ -174,36 +222,32 @@ Here is an example: // Create inner Views on host, outside of a parallel region, uninitialized for (int k = 0; k < numOuter; ++k) { const std::string label = std::string ("Inner ") + std::to_string (k); - new (&outer[k]) inner_view_type (view_alloc (label, WithoutInitializing), numInner); + new (&outer(k)) inner_view_type (view_alloc (label, WithoutInitializing), numInner); } // Outer and inner views are now ready for use on device - Kokkos::RangePolicy range (0, numOuter); + Kokkos::RangePolicy<> range (0, numOuter); Kokkos::parallel_for ("my kernel label", range, KOKKOS_LAMBDA (const int i) { for (int j = 0; j < numInner; ++j) { - device_outer[i][j] = 10.0 * double (i) + double (j); + device_outer(i)(j) = 10.0 * double (i) + double (j); } } }); // Fence before deallocation on host, to make sure // that the device kernel is done first. - // Note the new fence syntax that requires an instance. - // This will work with other CUDA streams, etc. - Cuda ().fence (); + Kokkos::fence (); // Destroy inner Views, again on host, outside of a parallel region. for (int k = 0; k < 5; ++k) { - outer[k].~inner_view_type (); + outer(k).~inner_view_type (); } // You're better off disposing of outer immediately. outer = outer_view_type (); -Another approach is to create the inner Views as nonowning, from a single pool of memory. This makes it unnecessary to invoke their destructors. - 6.2.4 Const Views ~~~~~~~~~~~~~~~~~