-
Notifications
You must be signed in to change notification settings - Fork 15
CUDA kernel registration supports #198
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
BackendBench/backends/directory.py
Outdated
| cpp_sources=cpp_source, | ||
| cuda_sources=cuda_source, | ||
| functions=[folder_name], | ||
| verbose=True, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
check the no implicit headers mode, otherwise this function will take 90s per call
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I set no_implicit headers to Ture and added the header to CUDA files. As a result, the running time for TestDirectoryBackendCUDA decreased from approximately 50 seconds to around 10 seconds.
|
Update:
|
This pull request introduces the following changes: