Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow device policy execution for IntersectionShaper, DistributedClosestPoint #1392

Merged
merged 10 commits into from
Aug 26, 2024

Conversation

bmhan12
Copy link
Contributor

@bmhan12 bmhan12 commented Aug 6, 2024

This PR:

@bmhan12 bmhan12 force-pushed the feature/han12/unified_outstanding branch from 4fe8f0f to 0d90639 Compare August 6, 2024 22:56
Copy link
Member

@kennyweiss kennyweiss left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for following up on this @bmhan12 !

Did you notice any performance improvements in this branch w.r.t the unified memory usage in develop?

Comment on lines +1065 to +1068
axom::Array<double> sqDistThresh_host(
1,
1,
axom::execution_space<axom::SEQ_EXEC>::allocatorID());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍


// Determine new allocator (for CUDA/HIP policy, set to Unified)
// Set new default to device
axom::setDefaultAllocator(::getUmpireDeviceId<ExecSpace>());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍
+1 for reducing our usage of the global default allocator!

@@ -705,19 +693,20 @@ class IntersectionShaper : public Shaper
this->getDC()->RegisterField(volFracName, volFrac);

// Initialize hexahedral elements
m_hexes = axom::Array<HexahedronType>(NE, NE);
axom::ArrayView<HexahedronType> hexes_view = m_hexes.view();
m_hexes = axom::Array<HexahedronType>(NE, NE, kernel_allocator);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we be toggling between kernel and device in our naming?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch! Changed kernel_allocator to device_allocator to keep naming standard.

@bmhan12 bmhan12 force-pushed the feature/han12/unified_outstanding branch from 2c8f83f to 0dc759c Compare August 13, 2024 14:37
@bmhan12
Copy link
Contributor Author

bmhan12 commented Aug 13, 2024

Did you notice any performance improvements in this branch w.r.t the unified memory usage in develop?

I didn't do any major benchmarking, but at least for the quest_intersection_shaper unit tests, I was seeing:

  • On CUDA, about the same overall runtime (maybe it would be different if I disabled blueos's ats?)
  • On HIP, the unit test completion time goes from about an average runtime of 11.5 seconds to 9 seconds.

@bmhan12 bmhan12 merged commit 9fcbb54 into develop Aug 26, 2024
13 checks passed
@bmhan12 bmhan12 deleted the feature/han12/unified_outstanding branch August 26, 2024 20:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants