Allow device policy execution for IntersectionShaper, DistributedClosestPoint #1392

bmhan12 · 2024-08-06T16:05:26Z

This PR:

Contains work related to issue Address outstanding tests and examples that rely on unified memory #1339
Refactors IntersectionShaper and DistributedClosestPoint to allow for execution with a device policy (previously required unified policy to run on device)

kennyweiss

Thanks for following up on this @bmhan12 !

Did you notice any performance improvements in this branch w.r.t the unified memory usage in develop?

kennyweiss · 2024-08-07T20:37:50Z

src/axom/quest/detail/DistributedClosestPointImpl.hpp

+        axom::Array<double> sqDistThresh_host(
+          1,
+          1,
+          axom::execution_space<axom::SEQ_EXEC>::allocatorID());


kennyweiss · 2024-08-07T20:38:44Z

src/axom/quest/IntersectionShaper.hpp

-
-    // Determine new allocator (for CUDA/HIP policy, set to Unified)
-    // Set new default to device
-    axom::setDefaultAllocator(::getUmpireDeviceId<ExecSpace>());


👍
+1 for reducing our usage of the global default allocator!

white238 · 2024-08-08T23:51:25Z

src/axom/quest/IntersectionShaper.hpp

@@ -705,19 +693,20 @@ class IntersectionShaper : public Shaper
    this->getDC()->RegisterField(volFracName, volFrac);

    // Initialize hexahedral elements
-    m_hexes = axom::Array<HexahedronType>(NE, NE);
-    axom::ArrayView<HexahedronType> hexes_view = m_hexes.view();
+    m_hexes = axom::Array<HexahedronType>(NE, NE, kernel_allocator);


Should we be toggling between kernel and device in our naming?

Good catch! Changed kernel_allocator to device_allocator to keep naming standard.

…round for getCandidates(point)

…es workaround for getCandidates(point)" - implementation fails when ats is disabled for cuda, BitSet::size() seemingly cannot be found on either host or device" This reverts commit a081b7e.

bmhan12 · 2024-08-13T18:08:37Z

Did you notice any performance improvements in this branch w.r.t the unified memory usage in develop?

I didn't do any major benchmarking, but at least for the quest_intersection_shaper unit tests, I was seeing:

On CUDA, about the same overall runtime (maybe it would be different if I disabled blueos's ats?)
On HIP, the unit test completion time goes from about an average runtime of 11.5 seconds to 9 seconds.

bmhan12 force-pushed the feature/han12/unified_outstanding branch from 4fe8f0f to 0d90639 Compare August 6, 2024 22:56

bmhan12 requested review from BradWhitlock, kennyweiss, rhornung67, publixsubfan and gunney1 August 7, 2024 16:12

kennyweiss approved these changes Aug 7, 2024

View reviewed changes

white238 reviewed Aug 8, 2024

View reviewed changes

bmhan12 added 8 commits August 12, 2024 16:19

Make quest distributed closest point example device by default

7ceb993

Make quest spin_implicit_grid_test device by default - involves worka…

90140cd

…round for getCandidates(point)

debugging CUDA race condition with IntersectionShaper refactor

046c768

debugging proe intersection shaping

a1ff233

Workaround for Pro/E cuda intersection shaper test case

c310ee0

Revert "Make quest spin_implicit_grid_test device by default - involv…

9c8fca4

…es workaround for getCandidates(point)" - implementation fails when ats is disabled for cuda, BitSet::size() seemingly cannot be found on either host or device" This reverts commit a081b7e.

Add missing Pro/E documentation to IntersectionShaper.hpp doxygen

c0ef52e

Consistent naming kernel --> device

0dc759c

bmhan12 force-pushed the feature/han12/unified_outstanding branch from 2c8f83f to 0dc759c Compare August 13, 2024 14:37

bmhan12 added 2 commits August 19, 2024 13:02

Merge branch 'develop' into feature/han12/unified_outstanding

c8a1b4f

Merge branch 'develop' into feature/han12/unified_outstanding

4a4c4ec

rhornung67 approved these changes Aug 26, 2024

View reviewed changes

bmhan12 merged commit 9fcbb54 into develop Aug 26, 2024
13 checks passed

bmhan12 deleted the feature/han12/unified_outstanding branch August 26, 2024 20:04

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow device policy execution for IntersectionShaper, DistributedClosestPoint #1392

Allow device policy execution for IntersectionShaper, DistributedClosestPoint #1392

bmhan12 commented Aug 6, 2024

kennyweiss left a comment

kennyweiss Aug 7, 2024

kennyweiss Aug 7, 2024

white238 Aug 8, 2024

bmhan12 Aug 13, 2024

bmhan12 commented Aug 13, 2024

Allow device policy execution for IntersectionShaper, DistributedClosestPoint #1392

Allow device policy execution for IntersectionShaper, DistributedClosestPoint #1392

Conversation

bmhan12 commented Aug 6, 2024

kennyweiss left a comment

Choose a reason for hiding this comment

kennyweiss Aug 7, 2024

Choose a reason for hiding this comment

kennyweiss Aug 7, 2024

Choose a reason for hiding this comment

white238 Aug 8, 2024

Choose a reason for hiding this comment

bmhan12 Aug 13, 2024

Choose a reason for hiding this comment

bmhan12 commented Aug 13, 2024