You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I think we should run all API Validation tests and allow them to fail instead of only running a subset so that we are aware of the failures. Once we've dealt with them we can make it mandatory.
Currently failing tests:
gpuarrays/linalg/mul!/vector-matrix
mps/linalg (heisenbug, see local example)
mps/copy
Local example:
(Metal) pkg> test
Testing Metal
...
Testing Running tests...
2024-10-18 16:29:58.279 julia[36961:444153] Metal API Validation Enabled
2024-10-18 16:29:58.279 julia[36961:444153] Metal GPU Validation Enabled
┌ Info: System information:
│ macOS 15.0.1, Darwin 24.0.0
│
│ Toolchain:
│ - Julia: 1.11.1
│ - LLVM: 16.0.6
│
│ Julia packages:
│ - Metal.jl: 1.4.0
│ - GPUArrays: 11.0.0
│ - GPUCompiler: 1.0.0
│ - KernelAbstractions: 0.9.28
│ - ObjectiveC: 3.1.0
│ - LLVM: 9.1.2
│ - LLVMDowngrader_jll: 0.3.0+1
│
│ Environment:
│ - MTL_SHADER_VALIDATION: 1
│ - MTL_DEBUG_LAYER: 1
│
│ 1 device:
└ - Apple M2 Max (192.000 KiB allocated)
[ Info: Running 8 tests in parallel. If this is too many, specify the `--jobs` argument to the tests, or set the JULIA_CPU_THREADS environment variable.
From worker 7: 2024-10-18 16:30:06.803 julia[36969:444278] Metal API Validation Enabled
From worker 7: 2024-10-18 16:30:06.803 julia[36969:444278] Metal GPU Validation Enabled
From worker 4: 2024-10-18 16:30:06.814 julia[36966:444275] Metal API Validation Enabled
From worker 4: 2024-10-18 16:30:06.814 julia[36966:444275] Metal GPU Validation Enabled
From worker 9: 2024-10-18 16:30:06.817 julia[36971:444280] Metal API Validation Enabled
From worker 9: 2024-10-18 16:30:06.817 julia[36971:444280] Metal GPU Validation Enabled
From worker 3: 2024-10-18 16:30:06.828 julia[36965:444274] Metal API Validation Enabled
From worker 3: 2024-10-18 16:30:06.829 julia[36965:444274] Metal GPU Validation Enabled
From worker 8: 2024-10-18 16:30:06.831 julia[36970:444279] Metal API Validation Enabled
From worker 8: 2024-10-18 16:30:06.831 julia[36970:444279] Metal GPU Validation Enabled
From worker 6: 2024-10-18 16:30:06.842 julia[36968:444277] Metal API Validation Enabled
From worker 6: 2024-10-18 16:30:06.843 julia[36968:444277] Metal GPU Validation Enabled
From worker 2: 2024-10-18 16:30:06.843 julia[36964:444270] Metal API Validation Enabled
From worker 2: 2024-10-18 16:30:06.843 julia[36964:444270] Metal GPU Validation Enabled
From worker 5: 2024-10-18 16:30:06.863 julia[36967:444276] Metal API Validation Enabled
From worker 5: 2024-10-18 16:30:06.864 julia[36967:444276] Metal GPU Validation Enabled
| | ---------------- CPU ---------------- |
Test (Worker) | Time (s) | GC (s) | GC % | Alloc (MB) | RSS (MB) |
metallib (8) | 0.68 | 0.01 | 1.9 | 209.75 | 573.78 |
pool (9) | 1.12 | 0.03 | 2.5 | 320.76 | 596.08 |
From worker 10: 2024-10-18 16:30:14.310 julia[36977:444462] Metal API Validation Enabled
From worker 10: 2024-10-18 16:30:14.311 julia[36977:444462] Metal GPU Validation Enabled
From worker 8: Starting recording with the Blank template and GPU, Time Profiler, Metal Application, Metal GPU Counters, Metal Resource Events, os_signpost Instruments. Attaching to: julia (36970).
From worker 8: Ctrl-C to stop the recording
From worker 8: Stopping recording...
metal (7) | 4.18 | 0.12 | 2.8 | 528.20 | 712.36 |
From worker 7: ┌ Warning: Skipping script tests
From worker 7: └ @ Main ~/.julia/dev/Metal/test/scripts.jl:9
scripts (7) | 0.86 | 0.00 | 0.0 | 76.59 | 716.12 |
From worker 8: Recording completed. Saving output file...
From worker 8: Output file saved as: julia_1.trace
From worker 8: [ Info: System trace saved to /private/var/folders/4g/lnkpkf3s4rxd_wbl8vwnqs4r0000gn/T/jl_6ZIMtu/julia_1.trace; open the resulting trace in Instruments
profiling (8) | 6.73 | 0.00 | 0.0 | 99.23 | 593.14 |
From worker 10: ┌ Warning: Skipping capturing tests; capturing is not supported with Metal Shader Validation enabled
From worker 10: └ @ Main ~/.julia/dev/Metal/test/capturing.jl:4
capturing (10) | 0.82 | 0.00 | 0.0 | 85.42 | 560.77 |
From worker 11: 2024-10-18 16:30:24.248 julia[37027:445348] Metal API Validation Enabled
From worker 11: 2024-10-18 16:30:24.248 julia[37027:445348] Metal GPU Validation Enabled
execution (5) | 16.66 | 0.25 | 1.5 | 1773.28 | 793.55 |
mps/matrix (5) | 0.37 | 0.00 | 0.0 | 52.49 | 798.92 |
mps/size (5) | 0.04 | 0.00 | 0.0 | 1.41 | 799.62 |
mps/vector (5) | 0.14 | 0.00 | 0.0 | 19.17 | 800.42 |
examples (4) | 25.74 | 0.64 | 2.5 | 2717.08 | 2026.69 |
gpuarrays/indexing scalar (5) | 9.58 | 0.11 | 1.2 | 1401.12 | 881.42 |
kernelabstractions (6) | 30.00 | 0.56 | 1.9 | 3955.16 | 1033.52 |
random (9) | 31.51 | 0.49 | 1.5 | 3735.46 | 990.28 |
device/intrinsics (7) | 36.95 | 0.47 | 1.3 | 4235.62 | 1026.02 |
From worker 11:
From worker 11: [37027] signal 10 (1): Bus error: 10
From worker 11: in expression starting at /Users/christian/.julia/dev/Metal/test/mps/linalg.jl:3
From worker 11: objc_msgSend at /usr/lib/libobjc.A.dylib (unknown line)
From worker 11: _ZN24resolvedSharedPacketDataI23GPUDebugBadAccessPacketEC2ERKS0_15MTLFunctionTypeP24MTLGPUDebugCommandBufferP17MTLGPUDebugGPULog at /System/Library/PrivateFrameworks/MetalTools.framework/Versions/A/MetalTools (unknown line)
From worker 11: Allocations: 85721828 (Pool: 85719313; Big: 2515); GC: 47
mps/linalg (11) | failed at 2024-10-18T16:30:59.210
Worker 11 terminated.
Unhandled Task ERROR: EOFError: read end of file
Stacktrace:
[1] (::Base.var"#wait_locked#832")(s::Sockets.TCPSocket, buf::IOBuffer, nb::Int64)
@ Base ./stream.jl:970
[2] unsafe_read(s::Sockets.TCPSocket, p::Ptr{UInt8}, nb::UInt64)
@ Base ./stream.jl:978
[3] unsafe_read
@ ./io.jl:891 [inlined]
[4] unsafe_read(s::Sockets.TCPSocket, p::Base.RefValue{NTuple{4, Int64}}, n::Int64)
@ Base ./io.jl:890
[5] read!
@ ./io.jl:895 [inlined]
[6] deserialize_hdr_raw
@ ~/.julia/juliaup/julia-1.11.1+0.aarch64.apple.darwin14/share/julia/stdlib/v1.11/Distributed/src/messages.jl:167 [inlined]
[7] message_handler_loop(r_stream::Sockets.TCPSocket, w_stream::Sockets.TCPSocket, incoming::Bool)
@ Distributed ~/.julia/juliaup/julia-1.11.1+0.aarch64.apple.darwin14/share/julia/stdlib/v1.11/Distributed/src/process_messages.jl:172
[8] process_tcp_streams(r_stream::Sockets.TCPSocket, w_stream::Sockets.TCPSocket, incoming::Bool)
@ Distributed ~/.julia/juliaup/julia-1.11.1+0.aarch64.apple.darwin14/share/julia/stdlib/v1.11/Distributed/src/process_messages.jl:133
[9] (::Distributed.var"#103#104"{Sockets.TCPSocket, Sockets.TCPSocket, Bool})()
@ Distributed ~/.julia/juliaup/julia-1.11.1+0.aarch64.apple.darwin14/share/julia/stdlib/v1.11/Distributed/src/process_messages.jl:121
From worker 12: 2024-10-18 16:31:03.011 julia[37573:446632] Metal API Validation Enabled
From worker 12: 2024-10-18 16:31:03.011 julia[37573:446632] Metal GPU Validation Enabled
gpuarrays/math/power (6) | 26.93 | 0.53 | 2.0 | 4850.07 | 1274.00 |
array (2) | 64.57 | 1.23 | 1.9 | 8220.88 | 1769.12 |
gpuarrays/indexing find (7) | 23.17 | 0.57 | 2.4 | 5480.38 | 1208.73 |
gpuarrays/linalg/mul!/vector-matrix (9) | failed at 2024-10-18T16:31:20.021
gpuarrays/reductions/any all count (6) | 11.26 | 0.13 | 1.1 | 1729.80 | 1403.30 |
From worker 13: 2024-10-18 16:31:24.098 julia[37969:447437] Metal API Validation Enabled
From worker 13: 2024-10-18 16:31:24.098 julia[37969:447437] Metal GPU Validation Enabled
gpuarrays/uniformscaling (7) | 7.15 | 0.04 | 0.6 | 635.54 | 1348.91 |
gpuarrays/math/intrinsics (7) | 3.70 | 0.03 | 0.7 | 374.92 | 1410.30 |
mps/copy (8) | failed at 2024-10-18T16:31:34.072
From worker 14: 2024-10-18 16:31:37.928 julia[38219:447992] Metal API Validation Enabled
From worker 14: 2024-10-18 16:31:37.928 julia[38219:447992] Metal GPU Validation Enabled
gpuarrays/indexing multidimensional (12) | 52.60 | 0.74 | 1.4 | 6553.91 | 1061.39 |
gpuarrays/reductions/reducedim! (4) | 85.07 | 1.29 | 1.5 | 11756.37 | 2308.91 |
gpuarrays/linalg/norm (7) | 38.33 | 0.49 | 1.3 | 5759.17 | 1591.80 |
gpuarrays/vectors (7) | 0.17 | 0.00 | 0.0 | 22.95 | 1593.03 |
gpuarrays/linalg/mul!/matrix-matrix (6) | 56.91 | 0.43 | 0.8 | 5222.27 | 1547.66 |
gpuarrays/random (7) | 12.45 | 0.08 | 0.6 | 1200.88 | 1678.14 |
gpuarrays/linalg (5) | 104.86 | 1.66 | 1.6 | 14532.96 | 1559.11 |
gpuarrays/reductions/mapreducedim!_large (13) | 57.07 | 1.34 | 2.3 | 8654.05 | 1452.14 |
gpuarrays/constructors (4) | 22.09 | 0.19 | 0.9 | 2061.44 | 2376.53 |
gpuarrays/statistics (14) | 48.62 | 0.70 | 1.4 | 5975.28 | 955.05 |
gpuarrays/base (6) | 25.21 | 0.58 | 2.3 | 4725.99 | 1839.33 |
gpuarrays/reductions/== isequal (7) | 43.42 | 0.54 | 1.2 | 6198.30 | 2041.39 |
gpuarrays/reductions/reduce (4) | 61.15 | 1.22 | 2.0 | 11113.19 | 2376.53 |
gpuarrays/reductions/minimum maximum extrema (2) | 140.04 | 2.39 | 1.7 | 21722.48 | 2168.75 |
gpuarrays/reductions/mapreduce (12) | 114.96 | 1.89 | 1.6 | 17942.78 | 1959.66 |
gpuarrays/reductions/mapreducedim! (13) | 104.12 | 1.57 | 1.5 | 14456.61 | 2160.31 |
gpuarrays/reductions/sum prod (14) | 109.30 | 1.71 | 1.6 | 16192.86 | 2012.47 |
gpuarrays/broadcasting (5) | 152.22 | 2.05 | 1.3 | 19808.61 | 2611.33 |
Testing finished in 4 minutes, 47 seconds, 973 milliseconds
mps/linalg: Error During Test at none:1
Got exception outside of a @test
ProcessExitedException(11)
Worker 9 failed running test gpuarrays/linalg/mul!/vector-matrix:
Some tests did not pass: 139 passed, 1 failed, 0 errored, 0 broken.
gpuarrays/linalg/mul!/vector-matrix: Test Failed at /Users/christian/.julia/dev/GPUArrays/test/testsuite/linalg.jl:315
Expression: compare(*, AT, f(A), x)
Stacktrace:
[1] backtrace()
@ Base ./error.jl:114
[2] record(ts::Test.DefaultTestSet, t::Union{Test.Error, Test.Fail}; print_result::Bool)
@ Test ~/.julia/juliaup/julia-1.11.1+0.aarch64.apple.darwin14/share/julia/stdlib/v1.11/Test/src/Test.jl:1107
[3] record(ts::Test.DefaultTestSet, t::Union{Test.Error, Test.Fail})
@ Test ~/.julia/juliaup/julia-1.11.1+0.aarch64.apple.darwin14/share/julia/stdlib/v1.11/Test/src/Test.jl:1100
[4] top-level scope
@ ~/.julia/dev/Metal/test/runtests.jl:379
[5] include(fname::String)
@ Main ./sysimg.jl:38
[6] top-level scope
@ none:6
[7] eval
@ ./boot.jl:430 [inlined]
[8] exec_options(opts::Base.JLOptions)
@ Base ./client.jl:296
[9] _start()
@ Base ./client.jl:531
Worker 8 failed running test mps/copy:
Some tests did not pass: 143 passed, 1 failed, 0 errored, 64 broken.
mps/copy: Test Failed at /Users/christian/.julia/dev/Metal/test/mps/copy.jl:46
Expression: dstMat == srcMat
Evaluated: Int8[-7 -37 … -28 -9; -23 -38 … -89 -106; … ; 77 12 … 71 116; -92 -6 … -103 -51] == Int8[-7 -37 … -28 -9; -23 -38 … -89 -106; … ; 77 12 … 71 116; -92 -6 … -103 -51]
Stacktrace:
[1] record(ts::Test.DefaultTestSet, t::Union{Test.Error, Test.Fail}; print_result::Bool)
@ Test ~/.julia/juliaup/julia-1.11.1+0.aarch64.apple.darwin14/share/julia/stdlib/v1.11/Test/src/Test.jl:1107
[2] record(ts::Test.DefaultTestSet, t::Union{Test.Error, Test.Fail})
@ Test ~/.julia/juliaup/julia-1.11.1+0.aarch64.apple.darwin14/share/julia/stdlib/v1.11/Test/src/Test.jl:1100
[3] top-level scope
@ ~/.julia/dev/Metal/test/runtests.jl:379
[4] include(fname::String)
@ Main ./sysimg.jl:38
[5] top-level scope
@ none:6
[6] eval
@ ./boot.jl:430 [inlined]
[7] exec_options(opts::Base.JLOptions)
@ Base ./client.jl:296
[8] _start()
@ Base ./client.jl:531
Test Summary: | Pass Fail Error Broken Total Time
Overall | 9688 2 1 104 9795
metallib | 25 25
pool | 5 5
metal | 128 128
scripts | 0
profiling | 1 1
capturing | 0
execution | 37 37
mps/matrix | 76 76
mps/size | 9 9
mps/vector | 34 34
examples | 4 4
gpuarrays/indexing scalar | 399 399
kernelabstractions | 2179 8 2187
random | 818 818
device/intrinsics | 129 129
mps/linalg | 1 1
gpuarrays/math/power | 60 60
array | 409 32 441
gpuarrays/indexing find | 45 45
gpuarrays/linalg/mul!/vector-matrix | 139 1 140
gpuarrays/reductions/any all count | 101 101
gpuarrays/uniformscaling | 56 56
gpuarrays/math/intrinsics | 10 10
mps/copy | 143 1 64 208
gpuarrays/indexing multidimensional | 89 89
gpuarrays/reductions/reducedim! | 160 160
gpuarrays/linalg/norm | 264 264
gpuarrays/vectors | 10 10
gpuarrays/linalg/mul!/matrix-matrix | 360 360
gpuarrays/random | 52 52
gpuarrays/linalg | 397 397
gpuarrays/reductions/mapreducedim!_large | 40 40
gpuarrays/constructors | 832 832
gpuarrays/statistics | 52 52
gpuarrays/base | 95 95
gpuarrays/reductions/== isequal | 230 230
gpuarrays/reductions/reduce | 220 220
gpuarrays/reductions/minimum maximum extrema | 555 555
gpuarrays/reductions/mapreduce | 330 330
gpuarrays/reductions/mapreducedim! | 260 260
gpuarrays/reductions/sum prod | 636 636
gpuarrays/broadcasting | 299 299
FAILURE
Error in testset mps/linalg:
Error During Test at none:1
Got exception outside of a @test
ProcessExitedException(11)
Error in testset gpuarrays/linalg/mul!/vector-matrix:
Test Failed at /Users/christian/.julia/dev/GPUArrays/test/testsuite/linalg.jl:315
Expression: compare(*, AT, f(A), x)
Error in testset mps/copy:
Test Failed at /Users/christian/.julia/dev/Metal/test/mps/copy.jl:46
Expression: dstMat == srcMat
Evaluated: Int8[-7 -37 … -28 -9; -23 -38 … -89 -106; … ; 77 12 … 71 116; -92 -6 … -103 -51] == Int8[-7 -37 … -28 -9; -23 -38 … -89 -106; … ; 77 12 … 71 116; -92 -6 … -103 -51]
ERROR: LoadError: Test run finished with errors
in expression starting at /Users/christian/.julia/dev/Metal/test/runtests.jl:410
ERROR: Package Metal errored during testing
The text was updated successfully, but these errors were encountered:
I think we should run all API Validation tests and allow them to fail instead of only running a subset so that we are aware of the failures. Once we've dealt with them we can make it mandatory.
Currently failing tests:
gpuarrays/linalg/mul!/vector-matrix
mps/linalg
(heisenbug, see local example)mps/copy
Local example:
The text was updated successfully, but these errors were encountered: