Skip to content

Updates for Pytorch 2.7 #8429

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 20 commits into
base: dev
Choose a base branch
from
Open

Updates for Pytorch 2.7 #8429

wants to merge 20 commits into from

Conversation

ericspod
Copy link
Member

@ericspod ericspod commented Apr 25, 2025

Description

This will update MONAI to be compatible with PyTorch 2.7.1. There appear to be few code changes with this release so hopefully this will be simply a matter of updating versions. The versions tested in the actions are now fixed to explicit version strings rather than including "latest" as a PyTorch version, this will avoid new breaking versions being released and rendering PRs unmergable.

Types of changes

  • Non-breaking change (fix or new feature that would not break existing functionality).
  • Breaking change (fix or new feature that would cause existing functionality to change).
  • New tests added to cover the changes.
  • Integration tests passed locally by running ./runtests.sh -f -u --net --coverage.
  • Quick tests passed locally by running ./runtests.sh --quick --unittests --disttests.
  • In-line docstrings updated.
  • Documentation updated, tested make html command in the docs/ folder.

Summary by CodeRabbit

Summary by CodeRabbit

  • Bug Fixes

    • Implemented a platform-specific workaround to address issues with PyTorch's grid_sample on Windows with float64 CPU tensors, ensuring correct behavior in spatial transforms and resampling.
    • Suppressed unnecessary error messages during expected test failures to improve test output clarity.
    • Adjusted test tolerances on Windows to account for platform-specific numerical differences.
  • Chores

    • Updated dependency requirements to allow newer versions of PyTorch, with special handling for Windows.
    • Restricted the pytype version range for development environments.
    • Improved test configuration to include the latest PyTorch version in continuous integration.
    • Refined dependency version constraints to support PyTorch 2.7.1 and beyond.

Signed-off-by: Eric Kerfoot <[email protected]>
@ericspod ericspod requested review from Copilot, Nic-Ma and KumoLiu April 25, 2025 11:38
Copy link

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR updates MONAI’s torch dependency to a newer version aiming for improved compatibility with PyTorch. The changes include:

  • Increasing the minimum torch version in pyproject.toml from 2.3.0 to 2.4.1.
  • Updating the installation command in the GitHub Actions workflow (pythonapp.yml) accordingly.
  • Modifying the torch version matrix in the minimal workflow (pythonapp-min.yml).

Reviewed Changes

Copilot reviewed 4 out of 6 changed files in this pull request and generated 3 comments.

File Description
pyproject.toml Updated minimum torch dependency and Black target versions
.github/workflows/pythonapp.yml Updated torch installation command to new dependency
.github/workflows/pythonapp-min.yml Revised torch version matrix for testing
Files not reviewed (2)
  • docs/requirements.txt: Language not supported
  • setup.cfg: Language not supported

Signed-off-by: Eric Kerfoot <[email protected]>
Signed-off-by: Eric Kerfoot <[email protected]>
@ericspod
Copy link
Member Author

It's possible the CPU provided by the Windows runner is too old for PyTorch 2.7 which may now require instructions it doesn't have.

@ericspod
Copy link
Member Author

The issue with Windows appears to be related to float 64 calculations, specifically with RandRotate in tests\integration\test_pad_collation.py. This doesn't appear to be pad collation related and goes away if float 32 is used as the dtype. I'm investigating further.

@ericspod
Copy link
Member Author

I think I've traced the issue to apparently a bug in grid_sample, I've raised an issue here on the PyTorch repo.

@KumoLiu
Copy link
Contributor

KumoLiu commented Apr 29, 2025

I think I've traced the issue to apparently a bug in grid_sample, I've raised an issue here on the PyTorch repo.

Thank you for looking into this! Instead of waiting for a fix from PyTorch, do you think it's possible to implement a workaround by altering the dtype for Windows operating systems?

@ericspod
Copy link
Member Author

do you think it's possible to implement a workaround by altering the dtype for Windows operating systems?

I'm looking into that now and will hopefully have something soon. We may have to convert to float32 and back in places so we may have knock-on precision issues.

@KumoLiu
Copy link
Contributor

KumoLiu commented Apr 30, 2025

We may need waiting for the release from torch-tensorrt to support PyTorch2.7.
https://pypi.org/project/torch-tensorrt/#history

@ericspod
Copy link
Member Author

ericspod commented May 1, 2025

Hi @KumoLiu this got through the Windows tests now. I raised the issue with PyTorch so hopefully version 2.7.1 will resolve the issue, in the meantime we can run the blossom tests and discuss whether to merge this.

Signed-off-by: Eric Kerfoot <[email protected]>
@KumoLiu
Copy link
Contributor

KumoLiu commented May 2, 2025

/build

@KumoLiu
Copy link
Contributor

KumoLiu commented May 2, 2025

Error log:

[2025-05-02T05:07:25.738Z]   Attempting uninstall: setuptools
[2025-05-02T05:07:25.738Z]     Found existing installation: setuptools 45.2.0
[2025-05-02T05:07:25.738Z]     Uninstalling setuptools-45.2.0:
[2025-05-02T05:07:25.738Z]       Successfully uninstalled setuptools-45.2.0
[2025-05-02T05:08:04.452Z] ERROR: Exception:
[2025-05-02T05:08:04.452Z] Traceback (most recent call last):
[2025-05-02T05:08:04.452Z]   File "/usr/lib/python3.9/py_compile.py", line 144, in compile
[2025-05-02T05:08:04.452Z]     code = loader.source_to_code(source_bytes, dfile or file,
[2025-05-02T05:08:04.452Z]   File "<frozen importlib._bootstrap_external>", line 918, in source_to_code
[2025-05-02T05:08:04.452Z]   File "<frozen importlib._bootstrap>", line 228, in _call_with_frames_removed
[2025-05-02T05:08:04.452Z]   File "/usr/local/lib/python3.9/dist-packages/pytype/tools/merge_pyi/test_data/parse_error.py", line 2
[2025-05-02T05:08:04.452Z]     def f(*): pass
[2025-05-02T05:08:04.452Z]            ^
[2025-05-02T05:08:04.452Z] SyntaxError: named arguments must follow bare *
[2025-05-02T05:08:04.452Z] 
[2025-05-02T05:08:04.452Z] During handling of the above exception, another exception occurred:
[2025-05-02T05:08:04.452Z] 
[2025-05-02T05:08:04.452Z] Traceback (most recent call last):
[2025-05-02T05:08:04.452Z]   File "/usr/lib/python3.9/compileall.py", line 238, in compile_file
[2025-05-02T05:08:04.452Z]     ok = py_compile.compile(fullname, cfile, dfile, True,
[2025-05-02T05:08:04.452Z]   File "/usr/lib/python3.9/py_compile.py", line 150, in compile
[2025-05-02T05:08:04.452Z]     raise py_exc
[2025-05-02T05:08:04.452Z] py_compile.PyCompileError:   File "/usr/local/lib/python3.9/dist-packages/pytype/tools/merge_pyi/test_data/parse_error.py", line 2
[2025-05-02T05:08:04.452Z]     def f(*): pass
[2025-05-02T05:08:04.452Z]            ^
[2025-05-02T05:08:04.452Z] SyntaxError: named arguments must follow bare *
[2025-05-02T05:08:04.452Z] 
[2025-05-02T05:08:04.452Z] 
[2025-05-02T05:08:04.452Z] During handling of the above exception, another exception occurred:
[2025-05-02T05:08:04.452Z] 
[2025-05-02T05:08:04.452Z] Traceback (most recent call last):
[2025-05-02T05:08:04.452Z]   File "/usr/local/lib/python3.9/dist-packages/pip/_internal/cli/base_command.py", line 105, in _run_wrapper
[2025-05-02T05:08:04.452Z]     status = _inner_run()
[2025-05-02T05:08:04.452Z]   File "/usr/local/lib/python3.9/dist-packages/pip/_internal/cli/base_command.py", line 96, in _inner_run
[2025-05-02T05:08:04.452Z]     return self.run(options, args)
[2025-05-02T05:08:04.452Z]   File "/usr/local/lib/python3.9/dist-packages/pip/_internal/cli/req_command.py", line 68, in wrapper
[2025-05-02T05:08:04.452Z]     return func(self, options, args)
[2025-05-02T05:08:04.452Z]   File "/usr/local/lib/python3.9/dist-packages/pip/_internal/commands/install.py", line 459, in run
[2025-05-02T05:08:04.452Z]     installed = install_given_reqs(
[2025-05-02T05:08:04.452Z]   File "/usr/local/lib/python3.9/dist-packages/pip/_internal/req/__init__.py", line 83, in install_given_reqs
[2025-05-02T05:08:04.452Z]     requirement.install(
[2025-05-02T05:08:04.452Z]   File "/usr/local/lib/python3.9/dist-packages/pip/_internal/req/req_install.py", line 867, in install
[2025-05-02T05:08:04.452Z]     install_wheel(
[2025-05-02T05:08:04.452Z]   File "/usr/local/lib/python3.9/dist-packages/pip/_internal/operations/install/wheel.py", line 728, in install_wheel
[2025-05-02T05:08:04.452Z]     _install_wheel(
[2025-05-02T05:08:04.452Z]   File "/usr/local/lib/python3.9/dist-packages/pip/_internal/operations/install/wheel.py", line 614, in _install_wheel
[2025-05-02T05:08:04.452Z]     success = compileall.compile_file(path, force=True, quiet=True)
[2025-05-02T05:08:04.452Z]   File "/usr/lib/python3.9/compileall.py", line 255, in compile_file
[2025-05-02T05:08:04.452Z]     msg = err.msg.encode(sys.stdout.encoding,
[2025-05-02T05:08:04.452Z] TypeError: encode() argument 'encoding' must be str, not None
[2025-05-02T05:08:04.452Z] 

@KumoLiu
Copy link
Contributor

KumoLiu commented May 2, 2025

/build

Signed-off-by: Eric Kerfoot <[email protected]>
@ericspod
Copy link
Member Author

ericspod commented May 2, 2025

The blossom issue is related to the current pytype version, so I've added <=2024.4.11 to the requirements-dev.txt file for it.

@KumoLiu
Copy link
Contributor

KumoLiu commented May 2, 2025

/build

@KumoLiu
Copy link
Contributor

KumoLiu commented May 2, 2025

The blossom issue is related to the current pytype version, so I've added <=2024.4.11 to the requirements-dev.txt file for it.

Seems related to the new version of the pip: https://pypi.org/project/pip/#history
I tried downgrade it to 25.0.1, then it works.

raise issue here: google/pytype#1909

Copy link

coderabbitai bot commented Jul 18, 2025

Caution

Review failed

The head commit changed during the review from 52ba27d to f1824c4.

Walkthrough

The updates relax upper version constraints for the torch dependency across multiple requirement and configuration files, add PyTorch version 2.7.1 explicitly to the CI test matrix, and simplify the installation step in the workflow. Minor internal refactors optimize tensor handling in spatial transform functions. Tests are adjusted to suppress error output during expected failures.

Changes

File(s) Change Summary
.github/workflows/pythonapp-min.yml Added PyTorch 2.7.1 to test matrix; simplified installation step by removing conditional logic for 'latest' version.
docs/requirements.txt, pyproject.toml, setup.cfg Relaxed torch version constraint by removing the upper bound restriction <2.7.0.
requirements.txt Modified torch requirement: general lower bound >=2.4.1 without upper bound, with Windows-specific exclusion of version 2.7.0.
requirements-dev.txt Added upper bound <=2024.4.11 to pytype version for non-Windows platforms.
monai/networks/layers/spatial_transforms.py Refactored to assign contiguous tensor to local variable before calling grid_sample.
monai/transforms/spatial/array.py Refactored to assign unsqueeze(0) tensor to local variable before calling grid_sample.
tests/integration/test_pad_collation.py Suppressed PyTorch error messages during expected failure by redirecting stderr to null device.
tests/lazy_transforms_utils.py Added a blank line after setting resampler.lazy = True (no logic change).

Sequence Diagram(s)

sequenceDiagram
    participant User
    participant AffineTransform/Resample
    participant PyTorch grid_sample

    User->>AffineTransform/Resample: Call with input tensor
    AffineTransform/Resample->>AffineTransform/Resample: Prepare contiguous or unsqueezed tensor (cache in local variable)
    AffineTransform/Resample->>PyTorch grid_sample: Call grid_sample with prepared tensor
    PyTorch grid_sample-->>AffineTransform/Resample: Return output
    AffineTransform/Resample-->>User: Return output
Loading

Poem

🐇
The torch now burns without a leash,
No upper bound to slow its reach.
CI tests embrace version new,
Refactors tidy, errors few.
Quiet tests and cleaner code,
Help the rabbit’s happy road!
🐰✨

✨ Finishing Touches
  • 📝 Generate Docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Explain this complex logic.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai explain this code block.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and explain its main purpose.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR.
  • @coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🔭 Outside diff range comments (1)
.github/workflows/pythonapp-min.yml (1)

154-158: pip install torch ignores the Windows exclusion and the CPU-only wheel source

In the “min-dep-pytorch” job we fall back to a bare pip install torch when the matrix value is latest.
Two problems:

  1. The wheel resolver may pick up a CUDA build, defeating the “CPU-only” objective used elsewhere (--index-url …/cpu).
  2. Although this job runs on Linux, the same pattern appears in the min-dep-os job for Windows (line 57) and will happily install 2.7.0, bypassing the !=2.7.0 guard you just added in requirements.txt.
-  python -m pip install torch
+  # CPU-only latest build, respecting platform pins
+  python -m pip install "torch!=2.7.0" --index-url https://download.pytorch.org/whl/cpu

Apply the same logic to the Windows path (line 57) to honour the exclusion.

♻️ Duplicate comments (2)
pyproject.toml (1)

5-5: Consider raising minimum version to align with PR objectives.

The upper bound removal enables PyTorch 2.7 compatibility as intended. However, the previous review comment raises a valid point about potentially updating the minimum version to 2.7.0 since this PR specifically targets PyTorch 2.7 updates.

Please confirm whether the minimum version should remain at 2.4.1 for backward compatibility or be raised to 2.7.0 to match the PR's focus on PyTorch 2.7 updates.

.github/workflows/pythonapp-min.yml (1)

127-127: Still missing explicit 2.7.0 in the test matrix

The PR title claims PyTorch 2.7 support, yet the matrix lists the pinned versions up to 2.6.0 plus latest. Relying on latest to pick up 2.7.0 is brittle (as soon as 2.7.1 lands, 2.7.0 will silently drop out of CI).

Add an explicit 2.7.0 entry and keep latest for forward-looking coverage.

🧹 Nitpick comments (1)
requirements.txt (1)

1-2: Two overlapping torch lines – consolidate for clarity

Having both a global torch>=2.4.1 and a Windows-specific exclusion works, but it is easy to mis-read and invites drift. Consider collapsing into a single, self-documenting marker:

-torch>=2.4.1
-torch>=2.4.1, !=2.7.0; platform_system == "Windows"
+torch>=2.4.1; platform_system != "Windows"
+torch>=2.4.1,!=2.7.0; platform_system == "Windows"

Same behaviour, clearer intent.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between e499362 and e2ec553.

📒 Files selected for processing (10)
  • .github/workflows/pythonapp-min.yml (1 hunks)
  • docs/requirements.txt (1 hunks)
  • monai/networks/layers/spatial_transforms.py (3 hunks)
  • monai/transforms/spatial/array.py (2 hunks)
  • pyproject.toml (1 hunks)
  • requirements-dev.txt (1 hunks)
  • requirements.txt (1 hunks)
  • setup.cfg (1 hunks)
  • tests/integration/test_pad_collation.py (3 hunks)
  • tests/lazy_transforms_utils.py (2 hunks)
🧰 Additional context used
🧬 Code Graph Analysis (1)
tests/integration/test_pad_collation.py (1)
monai/utils/misc.py (1)
  • set_determinism (335-393)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (19)
  • GitHub Check: min-dep-pytorch (2.5.1)
  • GitHub Check: min-dep-pytorch (2.4.1)
  • GitHub Check: min-dep-pytorch (latest)
  • GitHub Check: min-dep-os (windows-latest)
  • GitHub Check: min-dep-pytorch (2.6.0)
  • GitHub Check: min-dep-os (macOS-latest)
  • GitHub Check: min-dep-py3 (3.9)
  • GitHub Check: min-dep-os (ubuntu-latest)
  • GitHub Check: min-dep-py3 (3.10)
  • GitHub Check: min-dep-py3 (3.12)
  • GitHub Check: min-dep-py3 (3.11)
  • GitHub Check: flake8-py3 (pytype)
  • GitHub Check: quick-py3 (macOS-latest)
  • GitHub Check: build-docs
  • GitHub Check: packaging
  • GitHub Check: flake8-py3 (codeformat)
  • GitHub Check: quick-py3 (ubuntu-latest)
  • GitHub Check: flake8-py3 (mypy)
  • GitHub Check: quick-py3 (windows-latest)
🔇 Additional comments (12)
tests/integration/test_pad_collation.py (2)

14-14: LGTM: Import changes are well-justified.

The new imports (os, redirect_stderr, first) are all properly used in the test method to suppress error output during expected failures.

Also applies to: 17-17, 40-40


102-104: LGTM: Improved test clarity with proper error suppression.

The change to use first(loader_fail) with stderr suppression is a good improvement. It makes the test more explicit while preventing PyTorch error messages from cluttering the test output during expected failures.

tests/lazy_transforms_utils.py (2)

14-14: LGTM: Import is properly used for platform detection.

The sys import is correctly used for Windows platform detection in the tolerance adjustment logic.


67-71: LGTM: Reasonable workaround for PyTorch Windows compatibility.

The platform-specific tolerance adjustment is a sensible temporary fix for the known PyTorch numerical precision issues on Windows with float64 CPU tensors. The increased tolerances (1e-4) are still reasonable for testing purposes.

Please monitor the referenced GitHub issue and remove this workaround once PyTorch fixes the underlying grid_sample issue.

monai/transforms/spatial/array.py (2)

17-17: LGTM - Sys import added for platform detection

The sys import is correctly added to support the Windows-specific workaround implementation.


2111-2134: Well-implemented workaround for Windows PyTorch grid_sample bug

The implementation correctly addresses the known PyTorch issue with float64 tensors on Windows CPU by:

  1. Targeted detection: Only applies the workaround when all conditions are met (Windows platform, float64 dtype, CPU device)
  2. Minimal scope: Conversions are applied only to the specific tensors passed to grid_sample
  3. Proper restoration: Converts the output back to float64 to maintain expected data types
  4. Clean implementation: The workaround is clearly documented and doesn't affect other code paths

The logic follows the pattern described in the PR objectives and should resolve the Windows test failures without impacting other platforms or data types.

monai/networks/layers/spatial_transforms.py (3)

14-14: LGTM: Import added for platform detection.

The sys import is correctly placed and necessary for the Windows-specific workaround implementation.


530-535: LGTM: Well-implemented platform-specific workaround.

The workaround logic correctly identifies the specific conditions (Windows + float64 + CPU) that trigger the PyTorch grid_sample issue. The inline comment provides good context with a reference to the related PR.


591-603: LGTM: Proper workaround implementation with precision considerations.

The implementation correctly handles the dtype conversion to work around the PyTorch grid_sample issue on Windows. The conversion back to float64 maintains API consistency, though there may be minor precision implications as noted in the PR discussion.

This is a temporary workaround for PyTorch issue #152385. Consider monitoring for PyTorch 2.7.1 release to remove this workaround when the upstream issue is fixed.

requirements-dev.txt (1)

23-23: LGTM: Version constraint addresses build failures.

The upper bound constraint on pytype (<=2024.4.11) appropriately addresses the build failures mentioned in the PR discussion. The platform-specific exclusion for Windows is also correct.

docs/requirements.txt (1)

2-2: LGTM: Consistent torch version constraint relaxation.

The removal of the upper bound constraint aligns with the PR's objective to enable PyTorch 2.7 compatibility for documentation builds.

setup.cfg (1)

45-45: LGTM: Consistent torch version constraint relaxation.

The removal of the upper bound constraint (<2.7.0) enables PyTorch 2.7 compatibility and is consistent with similar changes across other configuration files.

@ericspod
Copy link
Member Author

Hi @KumoLiu this should pass tests now so we can run blossom and merge. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants