-
Notifications
You must be signed in to change notification settings - Fork 70
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
KernelIntrinsics #562
base: main
Are you sure you want to change the base?
KernelIntrinsics #562
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #562 +/- ##
=====================================
Coverage 0.00% 0.00%
=====================================
Files 21 21
Lines 1575 1581 +6
=====================================
- Misses 1575 1581 +6 ☔ View full report in Codecov by Sentry. |
|
||
Returns the unique local work-item ID. | ||
""" | ||
function get_local_id end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So IIUC, backends should implement these like below, right?
function get_local_id()
return (threadIdx().x, threadIdx().y, threadIdx().z)
end
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah basically, and my goal is to replace the old internal functions the people had to override with definitions based on these functions.
This stack of pull requests is managed by Graphite. Learn more about stacking. |
2d12b29
to
1298a56
Compare
1298a56
to
1831242
Compare
1831242
to
63244eb
Compare
63244eb
to
af5c7db
Compare
af5c7db
to
ad4c968
Compare
Benchmark Results
Benchmark PlotsA plot of the benchmark results have been uploaded as an artifact to the workflow run for this PR. |
c51072f
to
a0e822e
Compare
ad4c968
to
1c8b7e9
Compare
1c8b7e9
to
55dde0c
Compare
55dde0c
to
f1e98cc
Compare
function get_global_size end | ||
|
||
""" | ||
get_global_id()::@NamedTuple{x::Int32, y::Int32, z::Int32} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be Int32 or Int64?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OpenCL defines these as Csize_t
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there any way to allow the platform to define the specific <:Integer
index type? Metal uses uint3 by default, which are three UInt32 values. I liked that CUDA Thrust allowed the indices to be templated, so I could use Int64 only when dealing with billions of datapoints.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
GPU dev call: We should make these Int
/size_t
So the idea is to decouple the back-ends from KA.jl, instead implementing KernelIntrinsics.jl? What's the advantage; do you envision packages other than KA.jl to build their kernel DSL on top of KernelIntrinsics.jl? |
f1e98cc
to
6841207
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some suggestions could not be made:
- src/pocl/nanoOpenCL.jl
- lines 670-674
87af609
to
aec6c0a
Compare
aec6c0a
to
e91b33b
Compare
Will KA/KI still be a greatest common denominator of the GPU backends, or are you looking to introduce optional intrinsics? How will the groupreduce API do in terms of portability? |
The intrinsics proposed here are the greatest common denominator. I could see us adding some more intrinsics for reductions, but that is TBD. |
48e3752
to
e565304
Compare
e91b33b
to
7a5e159
Compare
7a5e159
to
0a8301c
Compare
777c099
to
3bb80ac
Compare
0d72b34
to
1f3b249
Compare
1f3b249
to
a0fbc48
Compare
a0fbc48
to
7f02b2e
Compare
7f02b2e
to
bb0c213
Compare
bb0c213
to
16524fd
Compare
ed2ee63
to
9741962
Compare
16524fd
to
8ea6ad2
Compare
The goal is to allow for kernels to be written without relying on KernelAbstractions macros
cc: @maleadt @pxl-th