ComfyUI custom nodes for Qwen3‑TTS (12Hz): CustomVoice, VoiceDesign, and VoiceClone — with practical defaults for stability and speed on CUDA / Apple Silicon (MPS) / CPU.
If this repo saves you time, please ⭐ it — it helps more ComfyUI users discover a working Qwen3‑TTS setup.
- new workflow work with new released ComfyUI-QwenAR
Workflow
- Voice Clone supports reusable
VOICEinputs from the Voices Library. - New Tools: Create Voice, Load Voice, Whisper STT, and Voice Instruct presets (EN + CN).

- Advanced nodes expose attention selection:
auto / sage_attn / flash_attn / sdpa / eager. - README includes
extra_model_paths.yamlguidance for custom model locations. - Audio Duration node rewritten: cleaner logic, seconds-based outputs, optional frame calculation.
Option A — ComfyUI‑Manager (recommended)
- Open ComfyUI‑Manager → search ComfyUI‑QwenTTS → Install.
Option B — Git clone
cd ComfyUI/custom_nodes
git clone https://github.com/1038lab/ComfyUI-QwenTTS.gitUse ComfyUI’s embedded python if you’re on Portable:
Windows Portable
cd <ComfyUI_root>
python_embeded\python.exe -m pip install -r ComfyUI\custom_nodes\ComfyUI-QwenTTS\requirements.txt --no-cache-dirmacOS/Linux (typical)
python3 -m pip install -r ComfyUI/custom_nodes/ComfyUI-QwenTTS/requirements.txt --no-cache-dir- Import:
example_workflows/QwenTTS_sample_workflow.json - Run it once (first run is slower due to model download + warmup)
- Custom Voice (preset speakers): easy, high-quality TTS with 9 timbres.
- Voice Design: create voices using a natural-language description.
- Voice Clone: clone from reference audio + transcript, or reuse a saved
VOICE. - Multi‑Device: auto select CUDA → MPS → CPU.
- Local‑First models: prefer
ComfyUI/models/TTS/Qwen3-TTS/. - Tools bundle: Create/Load Voice, Whisper STT, Voice Instruct presets, Text Token Count.
- Advanced control nodes: sampling, max_new_tokens, attention backend, unload.
- Languages: Chinese, English, Japanese, Korean, German, French, Russian, Portuguese, Spanish, Italian.
- Instruction control: Supports voice style control via natural-language instructions.
- Tokenizer: Uses Qwen3-TTS-Tokenizer-12Hz for speech encoding/decoding.
| Model | Size | Features | Streaming | Instruction |
|---|---|---|---|---|
| CustomVoice | 1.7B | 9 premium timbres, style control | ✅ | ✅ |
| VoiceDesign | 1.7B | Voice design from descriptions | ✅ | ✅ |
| Base | 1.7B | 3s rapid voice clone, FT base | ✅ | - |
| CustomVoice | 0.6B | 9 premium timbres | ✅ | - |
| Base | 0.6B | 3s rapid voice clone | ✅ | - |
| Tokenizer | 12Hz | Speech encode/decode | - | - |
Models can be auto-downloaded to:
ComfyUI/models/TTS/Qwen3-TTS/<MODEL_NAME>/
Supported model IDs (Hugging Face):
- Qwen/Qwen3-TTS-12Hz-1.7B-CustomVoice
- Qwen/Qwen3-TTS-12Hz-0.6B-CustomVoice
- Qwen/Qwen3-TTS-12Hz-1.7B-VoiceDesign
- Qwen/Qwen3-TTS-12Hz-1.7B-Base
- Qwen/Qwen3-TTS-12Hz-0.6B-Base
- Qwen/Qwen3-TTS-Tokenizer-12Hz
If a model is missing locally, it will be downloaded automatically on first use.
All Qwen3-TTS assets are stored in one consistent location:
ComfyUI/models/TTS/Qwen3-TTS/<MODEL_NAME>/
This node will not download or create model folders elsewhere.
If you store models outside the default ComfyUI path, configure ComfyUI’s
extra_model_paths.yaml in the ComfyUI root. This node relies on ComfyUI’s
standard model path system.
Supported location:
ComfyUI/extra_model_paths.yaml
Example (ComfyUI format):
comfyui:
base_path: D:/AI/ComfyUI-Models
tts: models/TTS/ # use lowercase `tts`If your ComfyUI build does not expose a TTS key, keep the default layout
ComfyUI/models/TTS/Qwen3-TTS/ and skip this section.
How to place Qwen3-TTS models in a custom location:
- Set
base_pathto your shared models root. - Put Qwen3‑TTS models under:
<base_path>/TTS/Qwen3-TTS/<MODEL_NAME>/
- Add that root to
extra_model_paths.yaml(underttsas shown above). - Restart ComfyUI.
Qwen3-TTS follows the standard Hugging Face model layout (config, tokenizer, weights, etc.). Multiple JSON/config files are required by Transformers at runtime, so they cannot be safely collapsed into a single file without breaking loading.
You can download models manually and place them into:
ComfyUI/models/TTS/Qwen3-TTS/<MODEL_NAME>/
Hugging Face CLI example:
pip install -U "huggingface_hub[cli]"
huggingface-cli download Qwen/Qwen3-TTS-Tokenizer-12Hz --local-dir ./Qwen3-TTS-Tokenizer-12Hz
huggingface-cli download Qwen/Qwen3-TTS-12Hz-1.7B-CustomVoice --local-dir ./Qwen3-TTS-12Hz-1.7B-CustomVoice
huggingface-cli download Qwen/Qwen3-TTS-12Hz-1.7B-VoiceDesign --local-dir ./Qwen3-TTS-12Hz-1.7B-VoiceDesign
huggingface-cli download Qwen/Qwen3-TTS-12Hz-1.7B-Base --local-dir ./Qwen3-TTS-12Hz-1.7B-Base
huggingface-cli download Qwen/Qwen3-TTS-12Hz-0.6B-CustomVoice --local-dir ./Qwen3-TTS-12Hz-0.6B-CustomVoice
huggingface-cli download Qwen/Qwen3-TTS-12Hz-0.6B-Base --local-dir ./Qwen3-TTS-12Hz-0.6B-BaseThen move each downloaded folder into:
ComfyUI/models/TTS/Qwen3-TTS/
pip install -U modelscope
modelscope download --model Qwen/Qwen3-TTS-Tokenizer-12Hz --local_dir ./Qwen3-TTS-Tokenizer-12Hz
modelscope download --model Qwen/Qwen3-TTS-12Hz-1.7B-CustomVoice --local_dir ./Qwen3-TTS-12Hz-1.7B-CustomVoice
modelscope download --model Qwen/Qwen3-TTS-12Hz-1.7B-VoiceDesign --local_dir ./Qwen3-TTS-12Hz-1.7B-VoiceDesign
modelscope download --model Qwen/Qwen3-TTS-12Hz-1.7B-Base --local_dir ./Qwen3-TTS-12Hz-1.7B-Base
modelscope download --model Qwen/Qwen3-TTS-12Hz-0.6B-CustomVoice --local_dir ./Qwen3-TTS-12Hz-0.6B-CustomVoice
modelscope download --model Qwen/Qwen3-TTS-12Hz-0.6B-Base --local_dir ./Qwen3-TTS-12Hz-0.6B-BaseThis node auto-downloads missing models to:
ComfyUI/models/TTS/Qwen3-TTS/<MODEL_NAME>/
- Faster defaults (typically
do_sample=False) - Minimal inputs
- Expose
max_new_tokens, sampling knobs, attention backend selection (auto/sage_attn/flash_attn/sdpa/eager), unload_models, seed.
- Set a sensible
max_new_tokens(too high can cause long humming / trailing noise). - Prefer do_sample=False for stability.
- Use the speaker’s native language for best results.
pip install flash-attn --no-build-isolationpip install sageattentionThis is usually an incompatible transformers build (often 5.x dev/nightly).
Fix (recommended):
pip install -U "transformers==4.57.3" "tokenizers<0.20" --no-cache-dirThen restart ComfyUI.
Lower max_new_tokens (try 512–1024 for short text), and set do_sample=False.
Tip: use Text Token Count (QwenTTS) to pick a safe max_new_tokens and reduce long trailing noise.
Split long scripts into chunks, lower max_new_tokens, and use precision=bf16.
GPL‑3.0 (see LICENSE).
- Qwen3‑TTS by Alibaba Qwen Team
- ComfyUI community
