-
Notifications
You must be signed in to change notification settings - Fork 40
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Different result for personal argmax on CPU and GPU if array size is large enough #476
Comments
Can you share the output of
? |
|
I was able to reproduce this on macOS 14 but not 15.2 (developer beta). |
Thank you for trying this. Wow that's stiff. I guess I have to wait for the transition unless something obvious in my kernel can be fixed |
Interestingly, the threshold for functional seems to be 4GiB:
|
Did you check this using a M2 or a M3? |
Works for me on M1 and 15.1 julia> norm(res_g[2,:] - res_a[2,:], Inf)
# returns 232.0f0
0.0f0
julia>
julia> Metal.versioninfo()
macOS 15.1.0, Darwin 24.1.0
Toolchain:
- Julia: 1.11.1
- LLVM: 16.0.6
Julia packages:
- Metal.jl: 1.4.0
- GPUArrays: 11.1.0
- GPUCompiler: 1.0.1
- KernelAbstractions: 0.9.29
- ObjectiveC: 3.1.0
- LLVM: 9.1.3
- LLVMDowngrader_jll: 0.3.0+2
1 device:
- Apple M1 Max (384.000 KiB allocated) |
M2 Max like you. If anyone reading this has an M3 and is still on macOS 14, could you please try running the MWE and report back if it is broken or not? |
Hi,
I tried coding an argmax using
KernelAbstraction
in need for particles simulation. Sadly, the results from Metal and CPU differ.Basically I have a
field::Array{Float32, 4}
and I want to compute in parallelargmax(field[x1,x2,x3,:])
cfor many (basicallyNnmc
) vectors(x1,x2,x3)
in parallel. In the code below, this vector is fixedx1,x2,x3 = (1, 1, 1)
.I found that the argmax differ whether the code is run on CPU or on Metal and only if
field
is large enough. This is the bulk of the issue.If the
field
is smaller the discrepancy seems to disappear:The text was updated successfully, but these errors were encountered: