Skip to content

Conversation

@jiannanWang
Copy link
Contributor

This pull request introduces the following changes:

  1. Adds load_inline functionality and enables support for CUDA kernel registration.
  2. Introduces a script for simple CUDA kernel creation
  3. Adds tests for CUDA kernel registration

@meta-cla meta-cla bot added the CLA Signed This label is managed by the Meta Open Source bot. label Oct 28, 2025
@jiannanWang jiannanWang marked this pull request as ready for review October 28, 2025 21:17
@jiannanWang jiannanWang marked this pull request as draft October 28, 2025 21:26
@jiannanWang jiannanWang marked this pull request as ready for review October 30, 2025 06:21
cpp_sources=cpp_source,
cuda_sources=cuda_source,
functions=[folder_name],
verbose=True,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

check the no implicit headers mode, otherwise this function will take 90s per call

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I set no_implicit headers to Ture and added the header to CUDA files. As a result, the running time for TestDirectoryBackendCUDA decreased from approximately 50 seconds to around 10 seconds.

@jiannanWang
Copy link
Contributor Author

Update:

  • set no_implicit_headers to True and the running time for TestDirectoryBackendCUDA dropped from 50s to 10s.
  • CI env doesn't have CUDA_HOME and thus cannot run TestDirectoryBackendCUDA. TestDirectoryBackendCUDA is skipped in CI.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Meta Open Source bot.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants