Skip to content

Conversation

jhavukainen
Copy link
Contributor

No description provided.

@meta-cla meta-cla bot added the CLA Signed This label is managed by the Meta Open Source bot. label Sep 18, 2025
- Fix cpu kernel generation ([\#158350](https://github.com/pytorch/pytorch/pull/158350))
- Improve tabbing in cpp generation ([\#158351](https://github.com/pytorch/pytorch/pull/158351))
- Enable more tests ([\#158703](https://github.com/pytorch/pytorch/pull/158703))
- Enable dlpack integration ([\#158888](https://github.com/pytorch/pytorch/pull/158888))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a user facing feature, it enables DLPack for MPS backend

- Improve tabbing in cpp generation ([\#158351](https://github.com/pytorch/pytorch/pull/158351))
- Enable more tests ([\#158703](https://github.com/pytorch/pytorch/pull/158703))
- Enable dlpack integration ([\#158888](https://github.com/pytorch/pytorch/pull/158888))
- Dynamic reductions ([\#159355](https://github.com/pytorch/pytorch/pull/159355))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a user facing feature it adds support for dynamic shapes for torch.compile MPS backend

Comment on lines 29 to 30
- Build metal kernels of MacOS-14+ ([\#159733](https://github.com/pytorch/pytorch/pull/159733))
- Remove all pre-MacOS14 logic ([\#159912](https://github.com/pytorch/pytorch/pull/159912))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't it we just combine those two together, as say that one need to have MacOS-14 or above to be able to use GPU acceleration on Apple Silicon?


### improvements

- Add `shifted_chebyshev_polynomial_[tuvw], simd_[arg][max|min]`, `igamma/igammac,grid_sampler_3d, native_dropout`/`native_dropout_backward` ([\#157488](https://github.com/pytorch/pytorch/pull/157488), [\#158990](https://github.com/pytorch/pytorch/pull/158990), [\#161927](https://github.com/pytorch/pytorch/pull/161927), [\#160541](https://github.com/pytorch/pytorch/pull/160541), [\#162108](https://github.com/pytorch/pytorch/pull/162108))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

simd_argmin|max are not user facing I would say.

### improvements

- Add `shifted_chebyshev_polynomial_[tuvw], simd_[arg][max|min]`, `igamma/igammac,grid_sampler_3d, native_dropout`/`native_dropout_backward` ([\#157488](https://github.com/pytorch/pytorch/pull/157488), [\#158990](https://github.com/pytorch/pytorch/pull/158990), [\#161927](https://github.com/pytorch/pytorch/pull/161927), [\#160541](https://github.com/pytorch/pytorch/pull/160541), [\#162108](https://github.com/pytorch/pytorch/pull/162108))
- For sparse tensors: Add coalesce, `indices/values,sgn/asinh/atanh/asin/atan/ceil/erf/expm1/floor/frac/isnan/nan_to_num/log1p/rad2deg/deg2rad/neg/round/relu/sin/sinh/sqrt/tan/tanh/sign/signbit/isinf/isposinf/isneginf,cat` ([\#159729](https://github.com/pytorch/pytorch/pull/159729)/[\#160254](https://github.com/pytorch/pytorch/pull/160254), [\#160223](https://github.com/pytorch/pytorch/pull/160223),[\#161846](https://github.com/pytorch/pytorch/pull/161846),[\#162007](https://github.com/pytorch/pytorch/pull/162007))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we move this one to new feature: and call it something like [Beta] Partial sparse support for MPS backend

- Update `avg_pool3d` kernel to use `opmath_t` ([\#161071](https://github.com/pytorch/pytorch/pull/161071))
- Add slow version of `kthvalue` ([\#161817](https://github.com/pytorch/pytorch/pull/161817))
- Type-promote tensor-iterator common dtype ([\#160334](https://github.com/pytorch/pytorch/pull/160334))
- Add fused\_rms and sdpa\_mps fallback ops for AOTInductor ([\#156844](https://github.com/pytorch/pytorch/pull/156844))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not user facing IMO

- Add fused\_rms and sdpa\_mps fallback ops for AOTInductor ([\#156844](https://github.com/pytorch/pytorch/pull/156844))
- Implement logcumsumexp metal kernel ([\#156858](https://github.com/pytorch/pytorch/pull/156858))
- Migrate round unary op to Metal ([\#161712](https://github.com/pytorch/pytorch/pull/161712))
- Move max\_pool2d to Metal for `stride != 1` ([\#157876](https://github.com/pytorch/pytorch/pull/157876))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should go into performance improvements

- Type-promote tensor-iterator common dtype ([\#160334](https://github.com/pytorch/pytorch/pull/160334))
- Add fused\_rms and sdpa\_mps fallback ops for AOTInductor ([\#156844](https://github.com/pytorch/pytorch/pull/156844))
- Implement logcumsumexp metal kernel ([\#156858](https://github.com/pytorch/pytorch/pull/156858))
- Migrate round unary op to Metal ([\#161712](https://github.com/pytorch/pytorch/pull/161712))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That should go into bugfixes, because it fixes the behavior for X.5/-X.5 cases

- Ensure that tensors are contiguous before using MPS linear kernel ([\#161641](https://github.com/pytorch/pytorch/pull/161641))
- Address NaNs if SDPA is called with all values masked from query ([\#157727](https://github.com/pytorch/pytorch/pull/157727))
- Fix invalid formatting ([\#158436](https://github.com/pytorch/pytorch/pull/158436))
- Update `avg_pool2d` to use Metal kernel when `ceil_mode=True` ([\#161011](https://github.com/pytorch/pytorch/pull/161011))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would've moved it to improvements, as before this change it simply errored out

@jhavukainen
Copy link
Contributor Author

Thanks for the comments @malfet! All of them seemed good to me, updated the PR accordingly.

- Add API to query GPU core count ([\#160414](https://github.com/pytorch/pytorch/pull/160414))
- Update `avg_pool3d` kernel to use `opmath_t` ([\#161071](https://github.com/pytorch/pytorch/pull/161071))
- Add slow version of `kthvalue` ([\#161817](https://github.com/pytorch/pytorch/pull/161817))
- Type-promote tensor-iterator common dtype ([\#160334](https://github.com/pytorch/pytorch/pull/160334))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should go to bugfixes, but probably is fine where it is

- Extend addmm to integral types ([\#160270](https://github.com/pytorch/pytorch/pull/160270))
- Add support for unsigned types ([\#159094](https://github.com/pytorch/pytorch/pull/159094))
- Add API to query GPU core count ([\#160414](https://github.com/pytorch/pytorch/pull/160414))
- Update `avg_pool3d` kernel to use `opmath_t` ([\#161071](https://github.com/pytorch/pytorch/pull/161071))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not user facing

@liangel-02 liangel-02 merged commit ae4c3bc into meta-pytorch:main Sep 22, 2025
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Meta Open Source bot.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants