Skip to content

macOS/M-Series Support for Qwen3-TTS#124

Open
evuori wants to merge 10 commits intoQwenLM:mainfrom
evuori:improve_macos_support
Open

macOS/M-Series Support for Qwen3-TTS#124
evuori wants to merge 10 commits intoQwenLM:mainfrom
evuori:improve_macos_support

Conversation

@evuori
Copy link

@evuori evuori commented Jan 28, 2026

Overview

This PR adds comprehensive macOS/Apple Silicon (M-series) support to Qwen3-TTS while maintaining full CUDA/GPU compatibility. The implementation includes intelligent device auto-detection, smart model path detection, and improved documentation.

Key Changes

1. Intelligent Device Auto-Detection

New File: qwen_tts/core/device_utils.py (256 lines)
get_optimal_device() - Auto-detects MPS > CUDA > CPU with intelligent fallback
get_attention_implementation() - Returns appropriate attention backend per device (auto-skips FlashAttention on non-CUDA)
device_synchronize() - Device-agnostic synchronization replacing torch.cuda.synchronize()
get_device_info() - Human-readable device descriptions
get_model_path() - Smart model path detection (local ./models/ first, then HuggingFace)

Benefits:

✅ macOS users can run examples without any code changes
✅ Automatic MPS detection on Apple Silicon
✅ FlashAttention gracefully skipped on non-CUDA devices
✅ Device-agnostic timing works everywhere

2. Updated Examples (All 4 example files)

examples/test_model_12hz_custom_voice.py
examples/test_model_12hz_voice_design.py
examples/test_model_12hz_base.py
examples/test_tokenizer_12hz.py

Changes:

✅ Use get_optimal_device() instead of hardcoded "cuda:0"
✅ Use get_attention_implementation() instead of hardcoded "flash_attention_2"
✅ Use device_synchronize() instead of torch.cuda.synchronize()
✅ Use get_model_path() for smart model detection
✅ Output audio files to ./example_output/ directory

3. Updated Scripts

qwen_tts/cli/demo.py - CLI demo with device auto-detection
finetuning/prepare_data.py - Fine-tuning prep with device auto-detection

4. Documentation Updates

README.md

New "Model Loading and Caching" section with smart path detection
New "macOS / Apple Silicon (M1/M2/M3/M4) Support" section
Updated model download instructions to use ./models/ directory
Added code examples and troubleshooting guide
CLAUDE.md

Updated "Model Loading Best Practices" with device utilities
Added "macOS / Apple Silicon Development" section with complete examples
Added directory structure diagram for model organization

5. Configuration Updates

.gitignore - Added example_output/ to prevent committing generated audio files

Commits

db29e79 feat: output example audio files to example_output directory
0f62671 docs: update model loading documentation for smart path detection
56b6c80 feat: add smart model path detection (local models or HuggingFace)
552709c fix: remove trailing slashes from model paths in examples
3514e0b feat: add intelligent device auto-detection for macOS/MPS support

Statistics

Files Modified: 11
New Files: 1 (device_utils.py)
Lines Added: ~600
Zero Breaking Changes: All existing code continues to work

How It Works

Device Auto-Detection

from qwen_tts.core.device_utils import get_optimal_device, get_attention_implementation

device = get_optimal_device() # MPS > CUDA > CPU
attn = get_attention_implementation(device) # Auto-skips FlashAttention on non-CUDA

Smart Model Path Detection

from qwen_tts.core.device_utils import get_model_path

Checks ./models/Qwen3-TTS-12Hz-1.7B-CustomVoice first
Falls back to HuggingFace if not found
model_path = get_model_path("Qwen/Qwen3-TTS-12Hz-1.7B-CustomVoice")

Backward Compatibility

✅ 100% Backward Compatible

Explicit device specs like device_map="cuda:0" still work
Explicit attention specs like attn_implementation="flash_attention_2" still work
Existing code paths unchanged
No breaking API changes

Testing

Users can test with:

Auto-detection (recommended)
python examples/test_model_12hz_custom_voice.py

CLI demo
qwen-tts-demo Qwen/Qwen3-TTS-12Hz-1.7B-CustomVoice

Fine-tuning prep with auto-detection
python finetuning/prepare_data.py --input_jsonl train.jsonl --output_jsonl train_with_codes.jsonl

Example Output

On macOS with MPS:

Using device: Apple Metal Performance Shaders (MPS) - Apple Silicon GPU
Found local model: ./models/Qwen3-TTS-12Hz-1.7B-CustomVoice
[CustomVoice Single] time: 2.456s

On NVIDIA GPU:

Using device: CUDA GPU: NVIDIA GeForce RTX 4090
Local model not found at ./models/Qwen3-TTS-12Hz-1.7B-CustomVoice, will download from HuggingFace...
[CustomVoice Single] time: 0.789s

evuori and others added 10 commits January 28, 2026 20:42
Implement cross-platform device detection (MPS > CUDA > CPU) while
maintaining full backward compatibility with existing CUDA workflows.

Changes:
- New module: qwen_tts/core/device_utils.py with device detection
- Auto-detect optimal device (MPS > CUDA > CPU)
- Auto-select attention implementation (skip FlashAttention on non-CUDA)
- Device-agnostic synchronization for accurate timing measurements
- Update all examples to use device auto-detection
- Update CLI demo to support device auto-detection
- Update fine-tuning prep script with device auto-detection
- Add comprehensive macOS/Apple Silicon documentation to README
- Update CLAUDE.md with macOS development guidelines

Benefits:
- macOS users can now run examples without code modifications
- Automatic MPS detection and usage on Apple Silicon
- FlashAttention gracefully skipped on non-CUDA devices
- 100% backward compatible - existing device specs still work
- Better support for diverse hardware environments

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
Model paths should not have trailing slashes when passed to HuggingFace
model loaders. This fixes the HFValidationError when running examples.

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
Add get_model_path() utility function that automatically checks for
locally downloaded models in ./models/ directory and uses them if
available, otherwise falls back to HuggingFace model IDs for auto-download.

This allows users to:
1. Run examples with pre-downloaded models without network access
2. Automatically download models on first run
3. Avoid redownloading models if they're already cached locally

Updated all examples and scripts to use get_model_path():
- test_model_12hz_custom_voice.py
- test_model_12hz_voice_design.py
- test_model_12hz_base.py
- test_tokenizer_12hz.py
- qwen_tts/cli/demo.py
- finetuning/prepare_data.py

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
Update README.md and CLAUDE.md to document the smart model path detection
feature that automatically checks for locally downloaded models in the
./models/ directory before downloading from HuggingFace.

Key updates:
- New 'Model Loading and Caching' section in README.md
- Updated model download instructions to use ./models/ directory
- Added directory structure example showing recommended layout
- Added code examples showing how smart path detection works
- Updated CLAUDE.md model loading best practices with get_model_path()
- Added complete example showing all device utilities together

This ensures users understand that examples will automatically work with
locally downloaded models while maintaining backward compatibility with
HuggingFace auto-download.

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
Update all example scripts to write generated audio files to a dedicated
./example_output/ directory instead of the current working directory.
This keeps outputs organized and separate from the source code.

Changes:
- test_model_12hz_custom_voice.py: writes to ./example_output/
- test_model_12hz_voice_design.py: writes to ./example_output/
- test_tokenizer_12hz.py: writes to ./example_output/
- Add example_output/ to .gitignore to prevent committing generated files

All example scripts now:
1. Import os module
2. Create output_dir = "example_output"
3. Use os.makedirs(output_dir, exist_ok=True) to ensure directory exists
4. Write all audio files to os.path.join(output_dir, filename)

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
@yerffejytnac
Copy link

this is awesome 👏

@israelandrewbrown
Copy link

#abandoned-repository

@evuori
Copy link
Author

evuori commented Feb 20, 2026

#abandoned-repository

?

@israelandrewbrown
Copy link

@evuori

It's appears as though the developers have abandoned this repository.

Can't wait until they merge this, in the meantime we'll have to use "voicebox".

Great pr btw.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants