MPS commits organized and cleaned up #94

jhavukainen · 2025-09-18T20:00:47Z

No description provided.

malfet · 2025-09-18T21:13:00Z

2.9.0/done/result_mps.md

+- Fix cpu kernel generation ([\#158350](https://github.com/pytorch/pytorch/pull/158350))  
+- Improve tabbing in cpp generation ([\#158351](https://github.com/pytorch/pytorch/pull/158351))  
+- Enable more tests ([\#158703](https://github.com/pytorch/pytorch/pull/158703))  
+- Enable dlpack integration ([\#158888](https://github.com/pytorch/pytorch/pull/158888))  


This is a user facing feature, it enables DLPack for MPS backend

malfet · 2025-09-18T21:13:51Z

2.9.0/done/result_mps.md

+- Improve tabbing in cpp generation ([\#158351](https://github.com/pytorch/pytorch/pull/158351))  
+- Enable more tests ([\#158703](https://github.com/pytorch/pytorch/pull/158703))  
+- Enable dlpack integration ([\#158888](https://github.com/pytorch/pytorch/pull/158888))  
+- Dynamic reductions ([\#159355](https://github.com/pytorch/pytorch/pull/159355))  


This is a user facing feature it adds support for dynamic shapes for torch.compile MPS backend

malfet · 2025-09-18T21:15:38Z

2.9.0/done/result_mps.md

+- Build metal kernels of MacOS-14+ ([\#159733](https://github.com/pytorch/pytorch/pull/159733))  
+- Remove all pre-MacOS14 logic ([\#159912](https://github.com/pytorch/pytorch/pull/159912))


Shouldn't it we just combine those two together, as say that one need to have MacOS-14 or above to be able to use GPU acceleration on Apple Silicon?

malfet · 2025-09-18T21:16:27Z

2.9.0/done/result_mps.md

+
+### improvements
+
+- Add `shifted_chebyshev_polynomial_[tuvw], simd_[arg][max|min]`,  `igamma/igammac,grid_sampler_3d, native_dropout`/`native_dropout_backward`  ([\#157488](https://github.com/pytorch/pytorch/pull/157488), [\#158990](https://github.com/pytorch/pytorch/pull/158990), [\#161927](https://github.com/pytorch/pytorch/pull/161927), [\#160541](https://github.com/pytorch/pytorch/pull/160541), [\#162108](https://github.com/pytorch/pytorch/pull/162108))  


simd_argmin|max are not user facing I would say.

malfet · 2025-09-18T21:16:59Z

2.9.0/done/result_mps.md

+### improvements
+
+- Add `shifted_chebyshev_polynomial_[tuvw], simd_[arg][max|min]`,  `igamma/igammac,grid_sampler_3d, native_dropout`/`native_dropout_backward`  ([\#157488](https://github.com/pytorch/pytorch/pull/157488), [\#158990](https://github.com/pytorch/pytorch/pull/158990), [\#161927](https://github.com/pytorch/pytorch/pull/161927), [\#160541](https://github.com/pytorch/pytorch/pull/160541), [\#162108](https://github.com/pytorch/pytorch/pull/162108))  
+- For sparse tensors: Add coalesce, `indices/values,sgn/asinh/atanh/asin/atan/ceil/erf/expm1/floor/frac/isnan/nan_to_num/log1p/rad2deg/deg2rad/neg/round/relu/sin/sinh/sqrt/tan/tanh/sign/signbit/isinf/isposinf/isneginf,cat` ([\#159729](https://github.com/pytorch/pytorch/pull/159729)/[\#160254](https://github.com/pytorch/pytorch/pull/160254), [\#160223](https://github.com/pytorch/pytorch/pull/160223),[\#161846](https://github.com/pytorch/pytorch/pull/161846),[\#162007](https://github.com/pytorch/pytorch/pull/162007))  


Should we move this one to new feature: and call it something like [Beta] Partial sparse support for MPS backend

malfet · 2025-09-18T21:17:40Z

2.9.0/done/result_mps.md

+- Update `avg_pool3d` kernel to use `opmath_t` ([\#161071](https://github.com/pytorch/pytorch/pull/161071))  
+- Add slow version of `kthvalue` ([\#161817](https://github.com/pytorch/pytorch/pull/161817))  
+- Type-promote tensor-iterator common dtype ([\#160334](https://github.com/pytorch/pytorch/pull/160334))  
+- Add fused\_rms and sdpa\_mps fallback ops for AOTInductor ([\#156844](https://github.com/pytorch/pytorch/pull/156844))  


This is not user facing IMO

malfet · 2025-09-18T21:18:08Z

2.9.0/done/result_mps.md

+- Add fused\_rms and sdpa\_mps fallback ops for AOTInductor ([\#156844](https://github.com/pytorch/pytorch/pull/156844))  
+- Implement logcumsumexp metal kernel ([\#156858](https://github.com/pytorch/pytorch/pull/156858))  
+- Migrate round unary op to Metal ([\#161712](https://github.com/pytorch/pytorch/pull/161712))  
+- Move max\_pool2d to Metal for `stride != 1` ([\#157876](https://github.com/pytorch/pytorch/pull/157876))


This should go into performance improvements

malfet · 2025-09-18T21:18:50Z

2.9.0/done/result_mps.md

+- Type-promote tensor-iterator common dtype ([\#160334](https://github.com/pytorch/pytorch/pull/160334))  
+- Add fused\_rms and sdpa\_mps fallback ops for AOTInductor ([\#156844](https://github.com/pytorch/pytorch/pull/156844))  
+- Implement logcumsumexp metal kernel ([\#156858](https://github.com/pytorch/pytorch/pull/156858))  
+- Migrate round unary op to Metal ([\#161712](https://github.com/pytorch/pytorch/pull/161712))  


That should go into bugfixes, because it fixes the behavior for X.5/-X.5 cases

malfet · 2025-09-18T21:20:06Z

2.9.0/done/result_mps.md

+- Ensure that tensors are contiguous before using MPS linear kernel ([\#161641](https://github.com/pytorch/pytorch/pull/161641))  
+- Address NaNs if SDPA is called with all values masked from query ([\#157727](https://github.com/pytorch/pytorch/pull/157727))  
+- Fix invalid formatting ([\#158436](https://github.com/pytorch/pytorch/pull/158436))  
+- Update `avg_pool2d` to use Metal kernel when `ceil_mode=True` ([\#161011](https://github.com/pytorch/pytorch/pull/161011))  


I would've moved it to improvements, as before this change it simply errored out

jhavukainen · 2025-09-19T16:49:57Z

Thanks for the comments @malfet! All of them seemed good to me, updated the PR accordingly.

malfet · 2025-09-19T22:44:27Z

2.9.0/done/result_mps.md

+- Add API to query GPU core count ([\#160414](https://github.com/pytorch/pytorch/pull/160414))  
+- Update `avg_pool3d` kernel to use `opmath_t` ([\#161071](https://github.com/pytorch/pytorch/pull/161071))
+- Add slow version of `kthvalue` ([\#161817](https://github.com/pytorch/pytorch/pull/161817))
+- Type-promote tensor-iterator common dtype ([\#160334](https://github.com/pytorch/pytorch/pull/160334))


This should go to bugfixes, but probably is fine where it is

2.9.0/done/result_mps.md

malfet · 2025-09-19T22:45:17Z

2.9.0/done/result_mps.md

+- Extend addmm to integral types ([\#160270](https://github.com/pytorch/pytorch/pull/160270))  
+- Add support for unsigned types ([\#159094](https://github.com/pytorch/pytorch/pull/159094))  
+- Add API to query GPU core count ([\#160414](https://github.com/pytorch/pytorch/pull/160414))  
+- Update `avg_pool3d` kernel to use `opmath_t` ([\#161071](https://github.com/pytorch/pytorch/pull/161071))


Not user facing

Co-authored-by: Nikita Shulga <[email protected]>

MPS commits organized and cleaned up

cb20bfe

meta-cla bot added the CLA Signed This label is managed by the Meta Open Source bot. label Sep 18, 2025

Further condense added ops

2d88519

liangel-02 approved these changes Sep 18, 2025

View reviewed changes

malfet reviewed Sep 18, 2025

View reviewed changes

Addressing PR comments

968736c

Replace unintended slash with a dot

6215445

malfet reviewed Sep 19, 2025

View reviewed changes

malfet approved these changes Sep 19, 2025

View reviewed changes

jhavukainen and others added 2 commits September 19, 2025 16:59

Update 2.9.0/done/result_mps.md

b35a1ec

Co-authored-by: Nikita Shulga <[email protected]>

Address additional comments

ba6c158

liangel-02 merged commit ae4c3bc into meta-pytorch:main Sep 22, 2025
1 check passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

MPS commits organized and cleaned up #94

MPS commits organized and cleaned up #94

Uh oh!

jhavukainen commented Sep 18, 2025

Uh oh!

malfet Sep 18, 2025

Uh oh!

malfet Sep 18, 2025

Uh oh!

malfet Sep 18, 2025

Uh oh!

malfet Sep 18, 2025

Uh oh!

malfet Sep 18, 2025

Uh oh!

malfet Sep 18, 2025

Uh oh!

malfet Sep 18, 2025

Uh oh!

malfet Sep 18, 2025

Uh oh!

malfet Sep 18, 2025

Uh oh!

jhavukainen commented Sep 19, 2025

Uh oh!

malfet Sep 19, 2025

Uh oh!

Uh oh!

malfet Sep 19, 2025

Uh oh!

Uh oh!

Uh oh!

		- Build metal kernels of MacOS-14+ ([\#159733](https://github.com/pytorch/pytorch/pull/159733))
		- Remove all pre-MacOS14 logic ([\#159912](https://github.com/pytorch/pytorch/pull/159912))


		### improvements

		- Add `shifted_chebyshev_polynomial_[tuvw], simd_[arg][max\|min]`, `igamma/igammac,grid_sampler_3d, native_dropout`/`native_dropout_backward` ([\#157488](https://github.com/pytorch/pytorch/pull/157488), [\#158990](https://github.com/pytorch/pytorch/pull/158990), [\#161927](https://github.com/pytorch/pytorch/pull/161927), [\#160541](https://github.com/pytorch/pytorch/pull/160541), [\#162108](https://github.com/pytorch/pytorch/pull/162108))

MPS commits organized and cleaned up #94

MPS commits organized and cleaned up #94

Uh oh!

Conversation

jhavukainen commented Sep 18, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jhavukainen commented Sep 19, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!