Skip to content

Conversation

Copy link
Contributor

Copilot AI commented Feb 7, 2026

Fix CUDA version mismatch in PyTorch dependencies

Plan:

  • Analyze the issue and understand the problem
  • Research correct PyTorch version compatibility
  • Update pyproject.toml to pin exact versions for torchvision
    • Windows: torchvision==0.22.1+cu128
    • Linux: torchvision==0.25.0+cu128 and add CUDA specifiers to torch/torchaudio
  • Update requirements.txt to match pyproject.toml changes
  • Update nano-vllm dependencies to ensure consistency
  • Test the changes with uv sync (simulated)
  • Verify Flash Attention compatibility
Original prompt

The issue describes a regression in the Windows installation process when using uv sync. A fresh clone using uv sync no longer produces a working environment, while previous installations worked fine. The problem arises due to a mismatch in CUDA versions for dependencies. Specifically, uv is pulling incompatible CUDA wheels, which breaks Flash Attention and leads to a significant dependency conflict.

Here's the analysis of the issue:

  • torchaudio resolves to version 2.10.0.dev with CUDA 13.0.
  • torchvision resolves to CUDA 12.4.
  • This mismatch causes Flash Attention to fail to import and run properly.

Steps to Reproduce:

  1. Run git pull or clone the repository fresh.
  2. Execute uv sync.
  3. Try to initialize the service. Flash Attention will fail to load.

Expected Behavior:
uv sync should respect the version requirements of the sub-modules (such as stable-audio-tools which expects Torch ~2.1.0) and maintain a unified CUDA version across all Torch components.

Proposed Fix:

  • Review and adjust the dependency constraints in pyproject.toml. For example:
    • Ensure that Torch components (torch, torchvision, and torchaudio) are aligned to a common compatible CUDA version (e.g., CUDA 12.4).
    • Verify the versions required for Flash Attention and other dependent components, ensuring compatibility.
    • Test the uv sync functionality after making adjustments to confirm that the regression is resolved and Flash Attention works as expected.

Please update the necessary dependency constraints and test to resolve this issue. Additional checks can ensure uniform CUDA compatibility for all dependencies.

This pull request was created from Copilot chat.


💡 You can make Copilot smarter by setting up custom instructions, customizing its development environment and configuring Model Context Protocol (MCP) servers. Learn more Copilot coding agent tips in the docs.

@ChuxiJ ChuxiJ closed this Feb 7, 2026
Copilot AI requested a review from ChuxiJ February 7, 2026 08:09
Copilot stopped work on behalf of ChuxiJ due to an error February 7, 2026 08:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants