Having issues with ComfyUI API? We've got you covered!
- 📖 API Guide - Complete API documentation
- 🛠️ Debug Script - Automated API diagnostics
- 🎯 Example Workflows - Ready-to-use API templates
| Issue | Solution |
|---|---|
| "node not found" | Check QwenVL-Mod installation |
| "model not found" | Verify model files in /models/ |
| "invalid input" | Check parameter formats |
| "queue full" | Wait for current jobs |
# Run API diagnostics
python debug_api.py --url http://localhost:18188
# Check available nodes
curl http://localhost:18188/object_info
# Test simple workflow
curl -X POST http://localhost:18188/prompt -d @test_workflow.jsonAsk user for:
- Error message (exact text)
- Workflow JSON (sanitized)
- Debug output (from debug_api.py)
- ComfyUI logs (API section)
The ComfyUI-QwenVL custom node integrates powerful Qwen-VL series of vision-language models (LVLMs) from Alibaba Cloud, including latest Qwen3-VL and Qwen2.5-VL, plus GGUF backends and text-only Qwen3 support. This advanced node enables seamless multimodal AI capabilities within your ComfyUI workflows, allowing for efficient text generation, image understanding, and video analysis.
- 2026/03/06: v2.2.4 🔧 Critical OOM Fix + Quantization Removal. [Update]
🚨 BitsAndBytes Disabled: Removed problematic quantization causing OOM on RTX 5090.
✅ FP16 Only: All HF nodes now use stable FP16 (~6GB VRAM).
🎯 Cleaner Interface: Removed quantization dropdown - use GGUF nodes for quantized models.
🔧 Both Nodes Fixed: Applied fixes to Standard and Advanced nodes with consistent parameters.
💡 User Guidance: HF nodes for quality, GGUF nodes for quantization - clear separation.
- 2026/02/27: v2.2.3 🔧 CUDA 13 Compatibility Fix + Redundancy Removal. [Update]
🔧 Removed unload_after_run: Eliminated redundant checkbox from all QwenVL nodes to prevent CUDA 13 conflicts.
🐛 Fixed Parameter Errors: Resolved "missing 1 required positional argument: unload_after_run" errors in all nodes.
🎯 Simplified Interface: Cleaner node interface without redundant parameters.
🧠 VRAM Cleanup Node: Maintained for manual cleanup when needed.
🏆 Community Credits: Thanks to user feedback that identified redundancy and parameter issues.
- 2026/02/27: v2.2.3 🚀 Critical T2V/I2V Fixes + ComfyUI Optimizations. [Update]
🚀 Batch Processing: Fixed critical T2V → GGUF issue with batch images from video generation.
🔄 Same Model Reuse: Resolved conflict when using same model between T2V and I2V nodes.
⚙️ Flash Attention 2: Added Flash Attention 2 support for performance boost on compatible hardware.
⚙️ ComfyUI Args: Optimized startup arguments with validated experimental features.
🔧 keep_model_loaded: Added missing parameter to PromptEnhancer for consistent memory management.
🐳 Final Docker Build: Optimized build with all fixes and maximum performance.
- 2026/02/18: v2.2.1 🔧 Critical GGUF VRAM Fix + Docker Optimized. [Update]
🔧 GGUF VRAM Fix: Resolved critical VRAM leak issue causing crashes after 2 executions.
🧹 Aggressive Cleanup: Implemented complete VRAM cleanup for all GGUF nodes (AILab_QwenVL_GGUF and PromptEnhancer).
🚀 Stable Performance: GGUF nodes now work reliably without VRAM accumulation.
🐳 Docker Enhanced: Updated Dockerfiles with RunPod-tested methods for Jupyter and FileBrowser.
🔄 ComfyUI Latest: Always latest stable version without manual updates.
📡 Complete SSH: Server + client SSH for full networking functionality.
🎯 Jupyter Terminal: Adopted RunPod method for working terminal.
- 2026/02/15: v2.2.0 🎬 WAN 2.2 Story Generation System. [Update]
🎬 Story Generation: Complete 4-segment video story generation with WAN 2.2
🔄 Auto-Split Node: Intelligent prompt splitting for continuous 20-second videos
📝 Show Text Node: Built-in text display node without external dependencies
🎯 Enhanced Prompts: Optimized WAN 2.2 NSFW Story prompts with better formatting
⚡ Performance: Optimized context settings for 8B models (65,536 tokens)
🐳 Docker Ready: Complete Story system integrated in Docker containers
🎨 Workflows: Ready-to-use WAN 2.2 Story and T2V workflows included
- 2026/02/14: v2.1.0 User-Friendly Keep Last Prompt Feature. [Update]
[!NOTE]
- 2026/02/12: v2.0.9 Bypass Mode parameter for prompt persistence. [Update]
🎛️ Bypass Mode: New
bypass_modeparameter allows maintaining previously generated prompts without regeneration.
🔄 Smart Cache: When bypass mode is enabled, nodes retrieve the most recent cached prompt for the current model.
🎯 Perfect Workflow: Generate prompts once, then enable bypass mode to preserve them while changing inputs.
⚡ Zero Resource Usage: Bypass mode consumes no computational resources - instant response.
📋 Universal Feature: Available across all nodes (HF, GGUF, PromptEnhancer, Advanced variants).
🎮 Simple Control: Just toggle the bypass_mode checkbox to enable/disable prompt persistence.
- 2026/02/06: v2.0.8 Bug fixes and stability improvements. [Update]
Note
🐛 Bug Fixes: Fixed JSON syntax error in system prompts and undefined variable issues.
🌐 Multilingual Support: Complete multilingual support added to all WAN 2.2 presets.
🎨 Style Detection: Enhanced visual style detection for anime, 3D, pixel art, and more.
🔧 Stability: Reverted problematic fixed seed improvements to maintain stable operation.
📝 Documentation: Updated README and changelog with today's improvements.
- 2026/02/04: v2.0.7 Smart prompt caching system with Fixed Seed Mode. [Update]
Note
🧠 Smart Caching: Automatic prompt caching prevents regeneration of identical prompts.
🔒 Fixed Seed Mode: Set any fixed seed value to maintain consistent prompts regardless of media variations.
⚡ Performance Boost: Instant response for cached prompts with zero model loading time.
🔧 Code Maintenance: Removed deprecated parameters across all download functions for future compatibility.
📈 GGUF Enhancement: Increased default context size from 8192 to 32768 for better model utilization.
🔄 Universal Caching: Fixed Seed Mode and smart caching now available across all nodes (HF, GGUF, PromptEnhancer).
🎯 Simplified Logic: Streamlined caching system that always includes seed for predictable behavior across all seed values.
- 🔥 Stronger NSFW Instructions: Enhanced system prompts with explicit directives for uncensored content generation
- 🔦 Fixed Lighting Invention: Corrected AI tendency to add lights not present in source images
- 🎯 Improved Accuracy: Better content fidelity for both NSFW descriptions and lighting analysis
- 🔄 Updated GGUF Models: Replaced noctrex models with mradermacher v2/v3 variants for better performance
- 📦 Enhanced Model Selection: Added Josiefied-Qwen3-VL-4B-Instruct-abliterated-beta-v1 (HF only) (HF, GGUF, PromptEnhancer).
- 2026/02/03: v2.0.6 Professional cinematography enhancement for all WAN 2.2 presets. [Update]
Note
🎬 Professional Specs: All WAN 2.2 presets now include comprehensive cinematography specifications.
📹 Technical Details: Light sources, shot types, lens specs, camera movements, color tone requirements.
🎯 Consistent Branding: Updated preset names with WAN family branding for better organization.
- 2026/03/13: v2.2.4 🎬 Critical I2V Timeline Fixes & NSFW Presets Optimization. [Update]
🎬 I2V Timeline (20s) Critical Fixes:
- ✅ Style Coherence: Fixed AI changing anime→realism mid-sequence
- ✅ Character Stability: Fixed characters disappearing/appearing incorrectly
- ✅ Natural Lighting: Fixed AI adding artificial lights not in image
- ✅ Timeline Structure: Fixed continuous numbering (6,7,8...) instead of 0-5 restart
- ✅ Format Consistency: Fixed missing parentheses and unwanted labels
- 🔧 All 8 NSFW Presets: Complete specifications + emoji display restored
- 📋 Token Settings Guide: Comprehensive workflow note for optimal parameters
- 2026/02/01: v2.0.5 Extended Storyboard preset added for WAN 2.2 format continuity. [Update]
Note
🎬 Extended Storyboard: New preset for seamless storyboard-to-storyboard generation with timeline format.
🔄 Continuity Focus: Each paragraph repeats previous content for smooth transitions.
🎯 WAN 2.2 Compatible: Same timeline structure and NSFW support as I2V preset.
- 2026/02/01: v2.0.4 Stability update - removed SageAttention for better compatibility and model output reliability. [Update]
Note
🔧 Flash Attention 2: Still available for 2-3x speedup on compatible hardware.
🛡️ Enhanced Stability: Clean attention pipeline with SDPA as reliable fallback.
- 2026/02/01: v2.0.3 SageAttention compatibility fix for proper patching across transformer versions. [Update]
Note
🔧 Critical Fix: Resolved AttributeError preventing Flash Attention 2 from working with certain transformer versions.
⚡ Performance Restored: 2-5x speedup now works correctly with 8-bit quantization on compatible hardware.
- 2026/02/01: v2.0.2 Enhanced model accessibility, improved custom prompt logic, and expanded NSFW content generation. [Update]
Note
🚀 Free Abliterated Models: Added token-free uncensored models as defaults for better accessibility.
🔧 Custom Prompt Fix: Now combines with preset templates instead of replacing them across all nodes.
📝 Enhanced NSFW: Comprehensive descriptions for adult content generation with detailed act specifications.
🎬 WAN 2.2 Priority: Moved video generation preset to top position for faster workflow access.
- 2026/01/30: v2.0.1-enhanced Added Flash Attention 2 support and WAN 2.2 integration. [Update]
Note
🚀 Flash Attention 2: 2-5x performance boost with 8-bit quantized attention for RTX 30+ GPUs.
🎬 WAN 2.2 Integration: New specialized prompts for cinematic video generation - convert images/videos to 5-second timeline descriptions (I2V) or text to video (T2V) with professional scene direction.
- 2025/12/22: v2.0.0 Added GGUF supported nodes and Prompt Enhancer nodes. [Update]
Important
Install llama-cpp-python before running GGUF nodes instruction
- 2025/11/10: v1.1.0 Runtime overhaul with attention-mode selector, flash-attn auto detection, smarter caching, and quantization/torch.compile controls in both nodes. [Update]
- 2025/10/31: v1.0.4 Custom Models Supported [Update]
- 2025/10/22: v1.0.3 Models list updated [Update]
- 2025/10/17: v1.0.0 Initial Release
- Support for Qwen3-VL and Qwen2.5-VL series models.
- Automatic model downloading from Hugging Face.
- On-the-fly quantization (4-bit, 8-bit, FP16).
- Preset and Custom Prompt system for flexible and easy use.
- Includes both a standard and an advanced node for users of all levels.
- Hardware-aware safeguards for FP8 model compatibility.
- Image and Video (frame sequence) input support.
- "Keep Model Loaded" option for improved performance on sequential runs.
- Seed parameter for reproducible generation.
- Standard & Advanced Nodes: Includes a simple QwenVL node for quick use and a QwenVL (Advanced) node with fine-grained control over generation.
- Prompt Enhancers: Dedicated text-only prompt enhancers for both HF and GGUF backends.
- Preset & Custom Prompts: Choose from a list of convenient preset prompts or write your own for full control. Custom prompts now combine with preset templates for enhanced flexibility.
- Smart Prompt Caching: Automatic caching system prevents regeneration of identical prompts, dramatically improving performance for repeated inputs. Cache persists across ComfyUI restarts.
- 🎛️ Bypass Mode: New
bypass_modeparameter allows maintaining previously generated prompts without regeneration. Generate once, then enable bypass mode to preserve prompts while changing inputs. Zero resource usage in bypass mode. - Fixed Seed Mode: Set seed = 1 to ignore image/video changes and maintain consistent prompts regardless of media variations. Perfect for stable workflow outputs.
- WAN 2.2 Integration: Specialized prompts for WAN 2.2 I2V (image-to-video) and T2V (text-to-video) generation with professional cinematography specifications and cinematic timeline structure. I2V preset prioritized for faster workflow access.
- Professional Cinematography: All WAN 2.2 presets include comprehensive technical specifications - light sources, shot types, lens specifications, camera movements, and color tone requirements for professional video generation.
- Extended Storyboard: New preset for seamless storyboard-to-storyboard generation with WAN 2.2 format compatibility, continuity focus, and professional cinematography details.
- WAN Family Branding: Consistent naming across all WAN 2.2 presets for better organization and workflow clarity.
- Free Abliterated Models: Default models include token-free uncensored options (Qwen3-4B-abliterated-TIES, Qwen3-8B-abliterated-TIES) for immediate accessibility.
- Multi-Model Support: Easily switch between various official Qwen-VL models with smart 4B-first ordering for VRAM efficiency.
- Automatic Model Download: Models are downloaded automatically on first use.
- Smart Quantization: Balance VRAM and performance with 4-bit, 8-bit, and FP16 options. 8-bit quantization enabled by default for optimal accessibility.
- Optimized Attention: Clean attention pipeline with Flash Attention 2 support and stable SDPA fallback. No complex patching that could interfere with model output.
- Hardware-Aware: Automatically detects GPU capabilities and prevents errors with incompatible models (e.g., FP8).
- Reproducible Generation: Use the seed parameter to get consistent outputs, with Fixed Seed Mode for ultimate stability.
- Memory Management: "Keep Model Loaded" option to retain the model in VRAM for faster processing.
- Image & Video Support: Accepts both single images and video frame sequences as input.
- Robust Error Handling: Provides clear error messages for hardware or memory issues.
- Clean Console Output: Minimal and informative console logs during operation.
-
Clone this repository to your ComfyUI/custom_nodes directory:
cd ComfyUI/custom_nodes git clone https://github.com/huchukato/ComfyUI-QwenVL-Mod.git -
Install the required dependencies:
cd ComfyUI/custom_nodes/ComfyUI-QwenVL-Mod pip install -r requirements.txt -
Restart ComfyUI.
For 2-3x performance boost with compatible GPUs:
# Install Flash Attention 2 (recommended)
pip install flash-attn --no-build-isolation
# Or compile from source
git clone https://github.com/Dao-AILab/flash-attention.git
cd flash-attention
python setup.py installRequirements for Flash Attention 2:
- NVIDIA GPU with capability >= 8.6 (RTX 20/30/40/50 series)
- CUDA >= 12.0
- PyTorch >= 2.3.0
See Flash Attention 2 section for details.
- QwenVL: Quick vision-language inference (image/video + preset/custom prompts).
- QwenVL (Advanced): Full control over sampling, device, and performance settings.
- QwenVL Prompt Enhancer: Text-only prompt enhancement (supports both Qwen3 text models and QwenVL models in text mode).
- QwenVL (GGUF): GGUF vision-language inference.
- QwenVL (GGUF Advanced): Extended GGUF controls (context, GPU layers, etc.).
- QwenVL Prompt Enhancer (GGUF): GGUF text-only prompt enhancement.
This repo includes GGUF nodes powered by llama-cpp-python (separate from the Transformers-based nodes).
- Nodes:
QwenVL (GGUF),QwenVL (GGUF Advanced),QwenVL Prompt Enhancer (GGUF) - Model folder (default):
ComfyUI/models/llm/GGUF/(configurable viagguf_models.json) - Vision requirement: install a vision-capable
llama-cpp-pythonwheel that providesQwen3VLChatHandler/Qwen25VLChatHandler
See docs/LLAMA_CPP_PYTHON_VISION_INSTALL.md
- HF models:
hf_models.jsonhf_vl_models: vision-language models (used by QwenVL nodes).hf_text_models: text-only models (used by Prompt Enhancer).
- GGUF models:
gguf_models.json - System prompts:
AILab_System_Prompts.json(includes both VL prompts and prompt-enhancer styles).
The models will be automatically downloaded on first use. If you prefer to download them manually, place them in the ComfyUI/models/LLM/Qwen-VL/ directory.
| Model | Link |
|---|---|
| Qwen3-VL-2B-Instruct | Download |
| Qwen3-VL-2B-Thinking | Download |
| Qwen3-VL-2B-Instruct-FP8 | Download |
| Qwen3-VL-2B-Thinking-FP8 | Download |
| Qwen3-VL-4B-Instruct | Download |
| Qwen3-VL-4B-Thinking | Download |
| Qwen3-VL-4B-Instruct-FP8 | Download |
| Qwen3-VL-4B-Thinking-FP8 | Download |
| Qwen3-VL-8B-Instruct | Download |
| Qwen3-VL-8B-Thinking | Download |
| Qwen3-VL-8B-Instruct-FP8 | Download |
| Qwen3-VL-8B-Thinking-FP8 | Download |
| Qwen3-VL-32B-Instruct | Download |
| Qwen3-VL-32B-Thinking | Download |
| Qwen3-VL-32B-Instruct-FP8 | Download |
| Qwen3-VL-32B-Thinking-FP8 | Download |
| Qwen2.5-VL-3B-Instruct | Download |
| Qwen2.5-VL-7B-Instruct | Download |
| Model | Link |
|---|---|
| Qwen3-0.6B | Download |
| Qwen3-4B-Instruct-2507 | Download |
| qwen3-4b-Z-Image-Engineer | Download |
| Group | Model | Repo | Alt Repo | Model Files | MMProj |
|---|---|---|---|---|---|
| Qwen text (GGUF) | Qwen3-4B-GGUF | Qwen/Qwen3-4B-GGUF | Qwen3-4B-Q4_K_M.gguf, Qwen3-4B-Q5_0.gguf, Qwen3-4B-Q5_K_M.gguf, Qwen3-4B-Q6_K.gguf, Qwen3-4B-Q8_0.gguf | ||
| Qwen-VL (GGUF) | Qwen3-VL-4B-Instruct-GGUF | Qwen/Qwen3-VL-4B-Instruct-GGUF | Qwen3VL-4B-Instruct-F16.gguf, Qwen3VL-4B-Instruct-Q4_K_M.gguf, Qwen3VL-4B-Instruct-Q8_0.gguf | mmproj-Qwen3VL-4B-Instruct-F16.gguf | |
| Qwen-VL (GGUF) | Qwen3-VL-8B-Instruct-GGUF | Qwen/Qwen3-VL-8B-Instruct-GGUF | Qwen3VL-8B-Instruct-F16.gguf, Qwen3VL-8B-Instruct-Q4_K_M.gguf, Qwen3VL-8B-Instruct-Q8_0.gguf | mmproj-Qwen3VL-8B-Instruct-F16.gguf | |
| Qwen-VL (GGUF) | Qwen3-VL-4B-Thinking-GGUF | Qwen/Qwen3-VL-4B-Thinking-GGUF | Qwen3VL-4B-Thinking-F16.gguf, Qwen3VL-4B-Thinking-Q4_K_M.gguf, Qwen3VL-4B-Thinking-Q8_0.gguf | mmproj-Qwen3VL-4B-Thinking-F16.gguf | |
| Qwen-VL (GGUF) | Qwen3-VL-8B-Thinking-GGUF | Qwen/Qwen3-VL-8B-Thinking-GGUF | Qwen3VL-8B-Thinking-F16.gguf, Qwen3VL-8B-Thinking-Q4_K_M.gguf, Qwen3VL-8B-Thinking-Q8_0.gguf | mmproj-Qwen3VL-8B-Thinking-F16.gguf |
- Add the "QwenVL" node from the 🧪AILab/QwenVL category.
- Select the model_name you wish to use.
- Connect an image or video (image sequence) source to the node.
- Write your prompt using the preset or custom field.
- Run the workflow.
For more control, use the "QwenVL (Advanced)" node. This gives you access to detailed generation parameters like temperature, top_p, beam search, and device selection.
| Parameter | Description | Default | Range | Node(s) |
|---|---|---|---|---|
| model_name | The Qwen-VL model to use. | Qwen3-VL-4B-Instruct | - | Standard & Advanced |
| quantization | On-the-fly quantization. Ignored for pre-quantized models (e.g., FP8). | 8-bit (Balanced) | 4-bit, 8-bit, None | Standard & Advanced |
| preset_prompt | A selection of pre-defined prompts for common tasks. | "Describe this..." | Any text | Standard & Advanced |
| custom_prompt | Overrides the preset prompt if provided. | Any text | Standard & Advanced | |
| max_tokens | Maximum number of new tokens to generate. | 1024 | 64-2048 | Standard & Advanced |
| keep_model_loaded | Keep the model in VRAM for faster subsequent runs. | True | True/False | Standard & Advanced |
| seed | A seed for reproducible results. | 1 | 1 - 2^64-1 | Standard & Advanced |
| temperature | Controls randomness. Higher values = more creative. (Used when num_beams is 1). | 0.6 | 0.1-1.0 | Advanced Only |
| top_p | Nucleus sampling threshold. (Used when num_beams is 1). | 0.9 | 0.0-1.0 | Advanced Only |
| num_beams | Number of beams for beam search. > 1 disables temperature/top_p sampling. | 1 | 1-10 | Advanced Only |
| repetition_penalty | Discourages repeating tokens. | 1.2 | 0.0-2.0 | Advanced Only |
| frame_count | Number of frames to sample from the video input. | 16 | 1-64 | Advanced Only |
| device | Override automatic device selection. | auto | auto, cuda, cpu | Advanced Only |
| attention_mode | Attention backend for performance optimization. | auto | auto, flash_attention_2, sdpa | Standard & Advanced |
| Mode | Precision | Memory Usage | Speed | Quality | Recommended For |
|---|---|---|---|---|---|
| None (FP16) | 16-bit Float | High | Fastest | Best | High VRAM GPUs (16GB+) |
| 8-bit (Balanced) | 8-bit Integer | Medium | Fast | Very Good | Balanced performance (8GB+) |
| 4-bit (VRAM-friendly) | 4-bit Integer | Low | Slower* | Good | Low VRAM GPUs (<8GB) |
* Note on 4-bit Speed: 4-bit quantization significantly reduces VRAM usage but may result in slower performance on some systems due to the computational overhead of real-time dequantization.
| Mode | Description | Speed | Memory | Requirements |
|---|---|---|---|---|
| auto | Automatically selects Flash Attention 2 if available, falls back to SDPA | Fast | Medium | flash-attn package |
| flash_attention_2 | Uses Flash Attention v2 for optimal performance | Fastest | Low | flash-attn + CUDA GPU |
| sdpa | PyTorch native Scaled Dot Product Attention | Medium | Medium | PyTorch 2.0+ |
Flash Attention 2 Requirements:
- NVIDIA GPU with capability >= 8.6 (RTX 20/30/40/50 series)
- CUDA >= 12.0
- PyTorch >= 2.3.0
- flash-attn package installed
| Setting | Recommendation |
|---|---|
| Model Choice | For most users, Qwen3-VL-4B-Instruct is a great starting point. If you have a 40-series GPU, try the -FP8 version for better performance. |
| Memory Mode | Keep keep_model_loaded enabled (True) for the best performance if you plan to run the node multiple times. Disable it only if you are running out of VRAM for other nodes. |
| Quantization | Start with the default 8-bit. If you have plenty of VRAM (>16GB), switch to None (FP16) for the best speed and quality. If you are low on VRAM, use 4-bit. |
| Performance | The first time a model is loaded with a specific quantization, it may be slow. Subsequent runs (with keep_model_loaded enabled) will be much faster. |
| Attention Mode | Use "flash_attention_2" for 2-3x speedup if you have compatible GPU. Otherwise use "auto" for automatic selection. |
This node utilizes the Qwen-VL series of models, developed by the Qwen Team at Alibaba Cloud. These are powerful, open-source large vision-language models (LVLMs) designed to understand and process both visual and textual information, making them ideal for tasks like detailed image and video description.
This integration includes support for Flash Attention 2, a cutting-edge attention implementation that provides significant performance improvements:
| Model | Flash Attention 2 | Speedup |
|---|---|---|
| Qwen2.5-VL-3B | 100% | 200-300% |
| Qwen3-VL-4B | 100% | 150-250% |
- Install Flash Attention 2 (see Installation)
- Select "flash_attention_2" in the
attention_modeparameter - Run your workflow - the system automatically applies the optimization
- Implementation: Uses optimized attention kernels for better memory efficiency
- Compatibility: Works with all quantization modes (4-bit, 8-bit, FP16)
- Integration: Seamlessly integrates with existing workflows
- Fallback: Automatically falls back to SDPA if Flash Attention 2 is not available
- flash-attn package installed
- Sufficient VRAM for your chosen model
- Compatible GPU (RTX 20 series or newer)
Flash Attention 2 not working?
# Check installation
python -c "import flash_attn; print('Flash Attention 2 available')"
# Check GPU capability
python -c "import torch; print(f'GPU capability: {torch.cuda.get_device_capability()}')"Common Problems:
- "Flash Attention 2 not available": Install the package and check GPU compatibility
- "CUDA not available": Ensure you have installed PyTorch compatible CUDA
- "GPU capability insufficient": Flash Attention 2 requires RTX 20 series or newer
This enhanced version includes specialized prompts for WAN 2.2 video generation, supporting both I2V (image-to-video) and T2V (text-to-video) workflows.
| Prompt Type | Use Case | Input | Output | Location |
|---|---|---|---|---|
| 🍿 Wan 2.2 I2V | Image-to-Video | Image + Text | 5-second cinematic timeline | QwenVL nodes |
| 🍿 Wan 2.2 T2V | Text-to-Video | Text only | 5-second cinematic timeline | Prompt Enhancer nodes |
- Cinematic Timeline Structure: 5-second videos with second-by-second descriptions
- Multilingual Support: Italian/English input → English optimized output
- Professional Scene Description: Film-style direction including lighting, camera, composition
- NSFW Handling: Appropriate content filtering and description
- WAN 2.2 Optimization: Specifically formatted for best video generation results
(At 0 seconds: A young woman stands facing a rack of clothes...)
(At 1 second: The blouse falls to the floor around her feet...)
(At 2 seconds: She reaches out with her right hand...)
(At 3 seconds: She turns her body slightly towards the mirror...)
(At 4 seconds: Lifting the hanger, she holds the dark fabric...)
(At 5 seconds: A subtle, thoughtful expression crosses her face...)
- For I2V: Use "🍿 Wan 2.2 I2V" preset in QwenVL nodes with image input
- For T2V: Use "🍿 Wan 2.2 T2V" style in Prompt Enhancer nodes with text only
- For Storyboard: Use "🍿 Wan Extended Storyboard" for seamless scene continuity
- For General Video: Use "🎥 Wan Cinematic Video" for professional single-scene descriptions
- Provide clear, descriptive input for better scene interpretation
- Use specific camera and lighting directions when possible
- Include mood and atmosphere details for cinematic results
- Leverage professional cinematography specs for optimal video quality
- The system automatically handles timeline optimization for WAN 2.2 presets
- ✅ Support for Qwen3-VL and Qwen2.5-VL models.
- ✅ GGUF backend support for faster inference.
- ✅ Prompt Enhancer nodes for text-only workflows.
- ✅ Flash Attention 2 integration for 2-3x performance boost.
- ✅ WAN 2.2 I2V and T2V video generation prompts.
- ✅ Extended Storyboard preset for scene continuity.
- ✅ Professional cinematography specifications for all WAN 2.2 presets.
- ✅ WAN family branding and consistent naming.
- ✅ Extended Storyboard preset for seamless continuity generation.
- ✅ Free abliterated models without token requirements.
- ✅ Enhanced custom prompt logic across all nodes.
- ✅ Comprehensive NSFW content generation support.
- ✅ Optimized model ordering and quantization defaults.
- ✅ Clean attention pipeline with SDPA stability.
- ✅ Removed complexity for better model output reliability.
- ✅ Smart prompt caching system for performance optimization.
- ✅ Fixed Seed Mode for stable outputs regardless of media variations.
- ✅ Persistent cache across ComfyUI restarts.
- ✅ Code maintenance updates for future compatibility.
- Qwen Team: Alibaba Cloud - For development and open-source powerful Qwen-VL models.
- ComfyUI: comfyanonymous - For incredible and extensible ComfyUI platform.
- llama-cpp-python: JamePeng/llama-cpp-python - GGUF backend with vision support used by GGUF nodes.
- GenorTG: GenorTG/ComfyUI-Genor-QwenVL-Mod - For innovative memory management improvements including
unload_after_runparameter and prompt cache optimizations that prevent OOM errors in multi-node workflows. - ComfyUI Integration: 1038lab - Developer of this custom node.
- huchukato
- 🐙 GitHub
- 🐦 X (Twitter)
- 🎨 Civitai - Check out my AI art models!
This repository code is released under GPL-3.0 License.


