Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix loading on unsupported platforms #459

Merged
merged 2 commits into from
Oct 16, 2024
Merged

Fix loading on unsupported platforms #459

merged 2 commits into from
Oct 16, 2024

Conversation

christiangnrd
Copy link
Contributor

@christiangnrd christiangnrd commented Oct 15, 2024

Closes #457
Closes #458

Backport to 1.4?

@christiangnrd
Copy link
Contributor Author

christiangnrd commented Oct 15, 2024

@omlins This makes Metal precompile on Windows for me. Can you let me know it it fixes #457 for you?

Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Metal Benchmarks

Benchmark suite Current: a3ac9d1 Previous: ddca5c4 Ratio
private array/construct 26878.5 ns 26722.25 ns 1.01
private array/broadcast 463166 ns 467500 ns 0.99
private array/random/randn/Float32 770375 ns 751250 ns 1.03
private array/random/randn!/Float32 698833 ns 636125 ns 1.10
private array/random/rand!/Int64 568458 ns 578875 ns 0.98
private array/random/rand!/Float32 602000 ns 589417 ns 1.02
private array/random/rand/Int64 771708 ns 813250 ns 0.95
private array/random/rand/Float32 621375 ns 542500 ns 1.15
private array/copyto!/gpu_to_gpu 658895.5 ns 515812.5 ns 1.28
private array/copyto!/cpu_to_gpu 822875.5 ns 727666 ns 1.13
private array/copyto!/gpu_to_cpu 693916 ns 554625 ns 1.25
private array/accumulate/1d 1330854.5 ns 1444334 ns 0.92
private array/accumulate/2d 1408458 ns 1469333 ns 0.96
private array/iteration/findall/int 2077833 ns 2243208.5 ns 0.93
private array/iteration/findall/bool 1856416 ns 2115125 ns 0.88
private array/iteration/findfirst/int 1662875 ns 1690562.5 ns 0.98
private array/iteration/findfirst/bool 1658666 ns 1674250 ns 0.99
private array/iteration/scalar 3934708 ns 2412750 ns 1.63
private array/iteration/logical 3231791.5 ns 3443541 ns 0.94
private array/iteration/findmin/1d 1569958 ns 1771104 ns 0.89
private array/iteration/findmin/2d 1322791.5 ns 1362584 ns 0.97
private array/reductions/reduce/1d 1051104 ns 793250 ns 1.33
private array/reductions/reduce/2d 692917 ns 713896 ns 0.97
private array/reductions/mapreduce/1d 1064875 ns 864750 ns 1.23
private array/reductions/mapreduce/2d 692125 ns 712500 ns 0.97
private array/permutedims/4d 841916 ns 912916 ns 0.92
private array/permutedims/2d 933333 ns 931604.5 ns 1.00
private array/permutedims/3d 914749.5 ns 1000937.5 ns 0.91
private array/copy 512583 ns 509083 ns 1.01
latency/precompile 4389223125 ns 4397344834 ns 1.00
latency/ttfp 6674100542 ns 6908319354 ns 0.97
latency/import 720130042 ns 719607250 ns 1.00
integration/metaldevrt 739167 ns 756958 ns 0.98
integration/byval/slices=1 1598250 ns 1632792 ns 0.98
integration/byval/slices=3 8447604 ns 8919542 ns 0.95
integration/byval/reference 1628791 ns 1659354.5 ns 0.98
integration/byval/slices=2 2604958.5 ns 2684500 ns 0.97
kernel/indexing 453145.5 ns 478708 ns 0.95
kernel/indexing_checked 460354.5 ns 476750 ns 0.97
kernel/launch 8500 ns 12084 ns 0.70
metal/synchronization/stream 14750 ns 19916 ns 0.74
metal/synchronization/context 14916 ns 20209 ns 0.74
shared array/construct 25145.833333333332 ns 24899.25 ns 1.01
shared array/broadcast 464229.5 ns 470042 ns 0.99
shared array/random/randn/Float32 866125 ns 754416 ns 1.15
shared array/random/randn!/Float32 668562.5 ns 650708 ns 1.03
shared array/random/rand!/Int64 567458 ns 586083.5 ns 0.97
shared array/random/rand!/Float32 594250 ns 595750 ns 1.00
shared array/random/rand/Int64 798458 ns 809708.5 ns 0.99
shared array/random/rand/Float32 616562.5 ns 589562.5 ns 1.05
shared array/copyto!/gpu_to_gpu 87625 ns 93459 ns 0.94
shared array/copyto!/cpu_to_gpu 87125 ns 95291 ns 0.91
shared array/copyto!/gpu_to_cpu 83625 ns 84042 ns 1.00
shared array/accumulate/1d 1351500 ns 1440417 ns 0.94
shared array/accumulate/2d 1403583 ns 1482250 ns 0.95
shared array/iteration/findall/int 1838084 ns 1940666.5 ns 0.95
shared array/iteration/findall/bool 1611895.5 ns 1770833 ns 0.91
shared array/iteration/findfirst/int 1377875 ns 1395458 ns 0.99
shared array/iteration/findfirst/bool 1362541 ns 1371084 ns 0.99
shared array/iteration/scalar 156000 ns 195250 ns 0.80
shared array/iteration/logical 3048604 ns 3219395.5 ns 0.95
shared array/iteration/findmin/1d 1258541 ns 1497458 ns 0.84
shared array/iteration/findmin/2d 1326292 ns 1371541.5 ns 0.97
shared array/reductions/reduce/1d 737291.5 ns 638333.5 ns 1.16
shared array/reductions/reduce/2d 689833 ns 718750 ns 0.96
shared array/reductions/mapreduce/1d 751312.5 ns 708500 ns 1.06
shared array/reductions/mapreduce/2d 700083 ns 707208.5 ns 0.99
shared array/permutedims/4d 848958 ns 952146 ns 0.89
shared array/permutedims/2d 854625 ns 929292 ns 0.92
shared array/permutedims/3d 912542 ns 1039791 ns 0.88
shared array/copy 249375 ns 288396 ns 0.86

This comment was automatically generated by workflow using github-action-benchmark.

@christiangnrd christiangnrd changed the title Gate getpagesize call behind function Fix loading on unsupported platforms Oct 15, 2024
@omlins
Copy link

omlins commented Oct 16, 2024

@omlins This makes Metal precompile on Windows for me. Can you let me know it it fixes #457 for you?

All the following runs should pass when the two issues #457 and #458 are resolved, merged and released:
https://github.com/omlins/CellArrays.jl/actions/runs/11330323307

I will surely test with your branch (once you have finished your discussions with @maleadt on the approach to take)

@christiangnrd
Copy link
Contributor Author

@omlins You can go ahead and try out this branch. Once you confirm that it works, I'll merge this and backport to 1.4

@omlins
Copy link

omlins commented Oct 16, 2024

@omlins You can go ahead and try out this branch. Once you confirm that it works, I'll merge this and backport to 1.4

I'm sorry, I just realized that you would need me to test it on windows. :) Unfortunately, I don't have a windows machine. Given that it works for you, I assume you can be confident enough to release it. Then, I can confirm you that it works in the CI before you do the backport.

@christiangnrd christiangnrd merged commit 100f831 into main Oct 16, 2024
2 checks passed
@christiangnrd christiangnrd deleted the windowsfix branch October 16, 2024 18:16
christiangnrd added a commit that referenced this pull request Oct 16, 2024
* Gate getpagesize call behind function

* Make `functional` require a fully-supported GPU
@christiangnrd
Copy link
Contributor Author

@omlins You can test directly on the release-1.4 branch if you prefer.

@omlins
Copy link

omlins commented Oct 30, 2024

@christiangnrd : the CI has now passed as you have probably seen. So I can also confirm from my side that the fix worked.
(the CI with Julia 1.9 still fails, because Metal v1.2 is used:
https://github.com/omlins/CellArrays.jl/actions/runs/11380953251/job/32236749235?pr=42#step:6:53 . But this is not a real problem)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants