QwenVL-Mod for ComfyUI

🚀 API Support & Troubleshooting

Having issues with ComfyUI API? We've got you covered!

📋 Quick API Help

📖 API Guide - Complete API documentation
🛠️ Debug Script - Automated API diagnostics
🎯 Example Workflows - Ready-to-use API templates

🔧 Common API Issues

Issue	Solution
"node not found"	Check QwenVL-Mod installation
"model not found"	Verify model files in `/models/`
"invalid input"	Check parameter formats
"queue full"	Wait for current jobs

🧪 Quick Debug

# Run API diagnostics
python debug_api.py --url http://localhost:18188

# Check available nodes
curl http://localhost:18188/object_info

# Test simple workflow
curl -X POST http://localhost:18188/prompt -d @test_workflow.json

📞 Need More Help?

Ask user for:

Error message (exact text)
Workflow JSON (sanitized)
Debug output (from debug_api.py)
ComfyUI logs (API section)

The ComfyUI-QwenVL custom node integrates powerful Qwen-VL series of vision-language models (LVLMs) from Alibaba Cloud, including latest Qwen3-VL and Qwen2.5-VL, plus GGUF backends and text-only Qwen3 support. This advanced node enables seamless multimodal AI capabilities within your ComfyUI workflows, allowing for efficient text generation, image understanding, and video analysis.

📰 News & Updates

2026/03/06: v2.2.4 🔧 Critical OOM Fix + Quantization Removal. [Update]

🚨 BitsAndBytes Disabled: Removed problematic quantization causing OOM on RTX 5090.
✅ FP16 Only: All HF nodes now use stable FP16 (~6GB VRAM).
🎯 Cleaner Interface: Removed quantization dropdown - use GGUF nodes for quantized models.
🔧 Both Nodes Fixed: Applied fixes to Standard and Advanced nodes with consistent parameters.
💡 User Guidance: HF nodes for quality, GGUF nodes for quantization - clear separation.

2026/02/27: v2.2.3 🔧 CUDA 13 Compatibility Fix + Redundancy Removal. [Update]

🔧 Removed unload_after_run: Eliminated redundant checkbox from all QwenVL nodes to prevent CUDA 13 conflicts.
🐛 Fixed Parameter Errors: Resolved "missing 1 required positional argument: unload_after_run" errors in all nodes.
🎯 Simplified Interface: Cleaner node interface without redundant parameters.
🧠 VRAM Cleanup Node: Maintained for manual cleanup when needed.
🏆 Community Credits: Thanks to user feedback that identified redundancy and parameter issues.

2026/02/27: v2.2.3 🚀 Critical T2V/I2V Fixes + ComfyUI Optimizations. [Update]

🚀 Batch Processing: Fixed critical T2V → GGUF issue with batch images from video generation.
🔄 Same Model Reuse: Resolved conflict when using same model between T2V and I2V nodes.
⚙️ Flash Attention 2: Added Flash Attention 2 support for performance boost on compatible hardware.
⚙️ ComfyUI Args: Optimized startup arguments with validated experimental features.
🔧 keep_model_loaded: Added missing parameter to PromptEnhancer for consistent memory management.
🐳 Final Docker Build: Optimized build with all fixes and maximum performance.

2026/02/18: v2.2.1 🔧 Critical GGUF VRAM Fix + Docker Optimized. [Update]

🔧 GGUF VRAM Fix: Resolved critical VRAM leak issue causing crashes after 2 executions.
🧹 Aggressive Cleanup: Implemented complete VRAM cleanup for all GGUF nodes (AILab_QwenVL_GGUF and PromptEnhancer).
🚀 Stable Performance: GGUF nodes now work reliably without VRAM accumulation.
🐳 Docker Enhanced: Updated Dockerfiles with RunPod-tested methods for Jupyter and FileBrowser.
🔄 ComfyUI Latest: Always latest stable version without manual updates.
📡 Complete SSH: Server + client SSH for full networking functionality.
🎯 Jupyter Terminal: Adopted RunPod method for working terminal.

2026/02/15: v2.2.0 🎬 WAN 2.2 Story Generation System. [Update]

🎬 Story Generation: Complete 4-segment video story generation with WAN 2.2
🔄 Auto-Split Node: Intelligent prompt splitting for continuous 20-second videos
📝 Show Text Node: Built-in text display node without external dependencies
🎯 Enhanced Prompts: Optimized WAN 2.2 NSFW Story prompts with better formatting
⚡ Performance: Optimized context settings for 8B models (65,536 tokens)
🐳 Docker Ready: Complete Story system integrated in Docker containers
🎨 Workflows: Ready-to-use WAN 2.2 Story and T2V workflows included

2026/02/14: v2.1.0 User-Friendly Keep Last Prompt Feature. [Update]

[!NOTE]

2026/02/12: v2.0.9 Bypass Mode parameter for prompt persistence. [Update]

🎛️ Bypass Mode: New bypass_mode parameter allows maintaining previously generated prompts without regeneration.
🔄 Smart Cache: When bypass mode is enabled, nodes retrieve the most recent cached prompt for the current model.
🎯 Perfect Workflow: Generate prompts once, then enable bypass mode to preserve them while changing inputs.
⚡ Zero Resource Usage: Bypass mode consumes no computational resources - instant response.
📋 Universal Feature: Available across all nodes (HF, GGUF, PromptEnhancer, Advanced variants).
🎮 Simple Control: Just toggle the bypass_mode checkbox to enable/disable prompt persistence.

2026/02/06: v2.0.8 Bug fixes and stability improvements. [Update]

Note

🐛 Bug Fixes: Fixed JSON syntax error in system prompts and undefined variable issues.
🌐 Multilingual Support: Complete multilingual support added to all WAN 2.2 presets.
🎨 Style Detection: Enhanced visual style detection for anime, 3D, pixel art, and more.
🔧 Stability: Reverted problematic fixed seed improvements to maintain stable operation.
📝 Documentation: Updated README and changelog with today's improvements.

2026/02/04: v2.0.7 Smart prompt caching system with Fixed Seed Mode. [Update]

Note

🧠 Smart Caching: Automatic prompt caching prevents regeneration of identical prompts.
🔒 Fixed Seed Mode: Set any fixed seed value to maintain consistent prompts regardless of media variations.
⚡ Performance Boost: Instant response for cached prompts with zero model loading time.
🔧 Code Maintenance: Removed deprecated parameters across all download functions for future compatibility.
📈 GGUF Enhancement: Increased default context size from 8192 to 32768 for better model utilization.
🔄 Universal Caching: Fixed Seed Mode and smart caching now available across all nodes (HF, GGUF, PromptEnhancer).
🎯 Simplified Logic: Streamlined caching system that always includes seed for predictable behavior across all seed values.

🔥 Version 2.1.0 - Enhanced NSFW Support & Lighting Accuracy

🔥 Stronger NSFW Instructions: Enhanced system prompts with explicit directives for uncensored content generation
🔦 Fixed Lighting Invention: Corrected AI tendency to add lights not present in source images
🎯 Improved Accuracy: Better content fidelity for both NSFW descriptions and lighting analysis
🔄 Updated GGUF Models: Replaced noctrex models with mradermacher v2/v3 variants for better performance
📦 Enhanced Model Selection: Added Josiefied-Qwen3-VL-4B-Instruct-abliterated-beta-v1 (HF only) (HF, GGUF, PromptEnhancer).

2026/02/03: v2.0.6 Professional cinematography enhancement for all WAN 2.2 presets. [Update]

Note

🎬 Professional Specs: All WAN 2.2 presets now include comprehensive cinematography specifications.
📹 Technical Details: Light sources, shot types, lens specs, camera movements, color tone requirements.
🎯 Consistent Branding: Updated preset names with WAN family branding for better organization.

2026/03/13: v2.2.4 🎬 Critical I2V Timeline Fixes & NSFW Presets Optimization. [Update]

🎬 I2V Timeline (20s) Critical Fixes:

✅ Style Coherence: Fixed AI changing anime→realism mid-sequence

✅ Character Stability: Fixed characters disappearing/appearing incorrectly

✅ Natural Lighting: Fixed AI adding artificial lights not in image

✅ Timeline Structure: Fixed continuous numbering (6,7,8...) instead of 0-5 restart

✅ Format Consistency: Fixed missing parentheses and unwanted labels

🔧 All 8 NSFW Presets: Complete specifications + emoji display restored

📋 Token Settings Guide: Comprehensive workflow note for optimal parameters

2026/02/01: v2.0.5 Extended Storyboard preset added for WAN 2.2 format continuity. [Update]

Note

🎬 Extended Storyboard: New preset for seamless storyboard-to-storyboard generation with timeline format.
🔄 Continuity Focus: Each paragraph repeats previous content for smooth transitions.
🎯 WAN 2.2 Compatible: Same timeline structure and NSFW support as I2V preset.

2026/02/01: v2.0.4 Stability update - removed SageAttention for better compatibility and model output reliability. [Update]

Note

🔧 Flash Attention 2: Still available for 2-3x speedup on compatible hardware.
🛡️ Enhanced Stability: Clean attention pipeline with SDPA as reliable fallback.

2026/02/01: v2.0.3 SageAttention compatibility fix for proper patching across transformer versions. [Update]

Note

🔧 Critical Fix: Resolved AttributeError preventing Flash Attention 2 from working with certain transformer versions.
⚡ Performance Restored: 2-5x speedup now works correctly with 8-bit quantization on compatible hardware.

2026/02/01: v2.0.2 Enhanced model accessibility, improved custom prompt logic, and expanded NSFW content generation. [Update]

Note

🚀 Free Abliterated Models: Added token-free uncensored models as defaults for better accessibility.
🔧 Custom Prompt Fix: Now combines with preset templates instead of replacing them across all nodes.
📝 Enhanced NSFW: Comprehensive descriptions for adult content generation with detailed act specifications.
🎬 WAN 2.2 Priority: Moved video generation preset to top position for faster workflow access.

2026/01/30: v2.0.1-enhanced Added Flash Attention 2 support and WAN 2.2 integration. [Update]

Note

🚀 Flash Attention 2: 2-5x performance boost with 8-bit quantized attention for RTX 30+ GPUs.
🎬 WAN 2.2 Integration: New specialized prompts for cinematic video generation - convert images/videos to 5-second timeline descriptions (I2V) or text to video (T2V) with professional scene direction.

2025/12/22: v2.0.0 Added GGUF supported nodes and Prompt Enhancer nodes. [Update]

Important

Install llama-cpp-python before running GGUF nodes instruction

2025/11/10: v1.1.0 Runtime overhaul with attention-mode selector, flash-attn auto detection, smarter caching, and quantization/torch.compile controls in both nodes. [Update]
2025/10/31: v1.0.4 Custom Models Supported [Update]
2025/10/22: v1.0.3 Models list updated [Update]
2025/10/17: v1.0.0 Initial Release
- Support for Qwen3-VL and Qwen2.5-VL series models.
- Automatic model downloading from Hugging Face.
- On-the-fly quantization (4-bit, 8-bit, FP16).
- Preset and Custom Prompt system for flexible and easy use.
- Includes both a standard and an advanced node for users of all levels.
- Hardware-aware safeguards for FP8 model compatibility.
- Image and Video (frame sequence) input support.
- "Keep Model Loaded" option for improved performance on sequential runs.
- Seed parameter for reproducible generation.

✨ Features

Standard & Advanced Nodes: Includes a simple QwenVL node for quick use and a QwenVL (Advanced) node with fine-grained control over generation.
Prompt Enhancers: Dedicated text-only prompt enhancers for both HF and GGUF backends.
Preset & Custom Prompts: Choose from a list of convenient preset prompts or write your own for full control. Custom prompts now combine with preset templates for enhanced flexibility.
Smart Prompt Caching: Automatic caching system prevents regeneration of identical prompts, dramatically improving performance for repeated inputs. Cache persists across ComfyUI restarts.
🎛️ Bypass Mode: New bypass_mode parameter allows maintaining previously generated prompts without regeneration. Generate once, then enable bypass mode to preserve prompts while changing inputs. Zero resource usage in bypass mode.
Fixed Seed Mode: Set seed = 1 to ignore image/video changes and maintain consistent prompts regardless of media variations. Perfect for stable workflow outputs.
WAN 2.2 Integration: Specialized prompts for WAN 2.2 I2V (image-to-video) and T2V (text-to-video) generation with professional cinematography specifications and cinematic timeline structure. I2V preset prioritized for faster workflow access.
Professional Cinematography: All WAN 2.2 presets include comprehensive technical specifications - light sources, shot types, lens specifications, camera movements, and color tone requirements for professional video generation.
Extended Storyboard: New preset for seamless storyboard-to-storyboard generation with WAN 2.2 format compatibility, continuity focus, and professional cinematography details.
WAN Family Branding: Consistent naming across all WAN 2.2 presets for better organization and workflow clarity.
Free Abliterated Models: Default models include token-free uncensored options (Qwen3-4B-abliterated-TIES, Qwen3-8B-abliterated-TIES) for immediate accessibility.
Multi-Model Support: Easily switch between various official Qwen-VL models with smart 4B-first ordering for VRAM efficiency.
Automatic Model Download: Models are downloaded automatically on first use.
Smart Quantization: Balance VRAM and performance with 4-bit, 8-bit, and FP16 options. 8-bit quantization enabled by default for optimal accessibility.
Optimized Attention: Clean attention pipeline with Flash Attention 2 support and stable SDPA fallback. No complex patching that could interfere with model output.
Hardware-Aware: Automatically detects GPU capabilities and prevents errors with incompatible models (e.g., FP8).
Reproducible Generation: Use the seed parameter to get consistent outputs, with Fixed Seed Mode for ultimate stability.
Memory Management: "Keep Model Loaded" option to retain the model in VRAM for faster processing.
Image & Video Support: Accepts both single images and video frame sequences as input.
Robust Error Handling: Provides clear error messages for hardware or memory issues.
Clean Console Output: Minimal and informative console logs during operation.

🚀 Installation

Clone this repository to your ComfyUI/custom_nodes directory:

cd ComfyUI/custom_nodes  
git clone https://github.com/huchukato/ComfyUI-QwenVL-Mod.git

Install the required dependencies:

cd ComfyUI/custom_nodes/ComfyUI-QwenVL-Mod  
pip install -r requirements.txt

Restart ComfyUI.

Optional: Flash Attention 2 Installation

For 2-3x performance boost with compatible GPUs:

# Install Flash Attention 2 (recommended)
pip install flash-attn --no-build-isolation

# Or compile from source
git clone https://github.com/Dao-AILab/flash-attention.git
cd flash-attention
python setup.py install

Requirements for Flash Attention 2:

NVIDIA GPU with capability >= 8.6 (RTX 20/30/40/50 series)
CUDA >= 12.0
PyTorch >= 2.3.0

See Flash Attention 2 section for details.

🧭 Node Overview

Transformers (HF) Nodes

QwenVL: Quick vision-language inference (image/video + preset/custom prompts).
QwenVL (Advanced): Full control over sampling, device, and performance settings.
QwenVL Prompt Enhancer: Text-only prompt enhancement (supports both Qwen3 text models and QwenVL models in text mode).

GGUF (llama.cpp) Nodes

QwenVL (GGUF): GGUF vision-language inference.
QwenVL (GGUF Advanced): Extended GGUF controls (context, GPU layers, etc.).
QwenVL Prompt Enhancer (GGUF): GGUF text-only prompt enhancement.

🧩 GGUF Nodes (llama.cpp backend)

This repo includes GGUF nodes powered by llama-cpp-python (separate from the Transformers-based nodes).

Nodes: QwenVL (GGUF), QwenVL (GGUF Advanced), QwenVL Prompt Enhancer (GGUF)
Model folder (default): ComfyUI/models/llm/GGUF/ (configurable via gguf_models.json)
Vision requirement: install a vision-capable llama-cpp-python wheel that provides Qwen3VLChatHandler / Qwen25VLChatHandler
See docs/LLAMA_CPP_PYTHON_VISION_INSTALL.md

🗂️ Config Files

HF models: hf_models.json
- hf_vl_models: vision-language models (used by QwenVL nodes).
- hf_text_models: text-only models (used by Prompt Enhancer).
GGUF models: gguf_models.json
System prompts: AILab_System_Prompts.json (includes both VL prompts and prompt-enhancer styles).

📥 Download Models

The models will be automatically downloaded on first use. If you prefer to download them manually, place them in the ComfyUI/models/LLM/Qwen-VL/ directory.

HF Vision Models (Qwen-VL)

Model	Link
Qwen3-VL-2B-Instruct	Download
Qwen3-VL-2B-Thinking	Download
Qwen3-VL-2B-Instruct-FP8	Download
Qwen3-VL-2B-Thinking-FP8	Download
Qwen3-VL-4B-Instruct	Download
Qwen3-VL-4B-Thinking	Download
Qwen3-VL-4B-Instruct-FP8	Download
Qwen3-VL-4B-Thinking-FP8	Download
Qwen3-VL-8B-Instruct	Download
Qwen3-VL-8B-Thinking	Download
Qwen3-VL-8B-Instruct-FP8	Download
Qwen3-VL-8B-Thinking-FP8	Download
Qwen3-VL-32B-Instruct	Download
Qwen3-VL-32B-Thinking	Download
Qwen3-VL-32B-Instruct-FP8	Download
Qwen3-VL-32B-Thinking-FP8	Download
Qwen2.5-VL-3B-Instruct	Download
Qwen2.5-VL-7B-Instruct	Download

HF Text Models (Qwen3)

Model	Link
Qwen3-0.6B	Download
Qwen3-4B-Instruct-2507	Download
qwen3-4b-Z-Image-Engineer	Download

GGUF Models (Manual Download)

Group	Model	Repo	Model Files	MMProj
Qwen text (GGUF)	Qwen3-4B-GGUF	Qwen/Qwen3-4B-GGUF	Qwen3-4B-Q4_K_M.gguf, Qwen3-4B-Q5_0.gguf, Qwen3-4B-Q5_K_M.gguf, Qwen3-4B-Q6_K.gguf, Qwen3-4B-Q8_0.gguf
Qwen-VL (GGUF)	Qwen3-VL-4B-Instruct-GGUF	Qwen/Qwen3-VL-4B-Instruct-GGUF	Qwen3VL-4B-Instruct-F16.gguf, Qwen3VL-4B-Instruct-Q4_K_M.gguf, Qwen3VL-4B-Instruct-Q8_0.gguf	mmproj-Qwen3VL-4B-Instruct-F16.gguf
Qwen-VL (GGUF)	Qwen3-VL-8B-Instruct-GGUF	Qwen/Qwen3-VL-8B-Instruct-GGUF	Qwen3VL-8B-Instruct-F16.gguf, Qwen3VL-8B-Instruct-Q4_K_M.gguf, Qwen3VL-8B-Instruct-Q8_0.gguf	mmproj-Qwen3VL-8B-Instruct-F16.gguf
Qwen-VL (GGUF)	Qwen3-VL-4B-Thinking-GGUF	Qwen/Qwen3-VL-4B-Thinking-GGUF	Qwen3VL-4B-Thinking-F16.gguf, Qwen3VL-4B-Thinking-Q4_K_M.gguf, Qwen3VL-4B-Thinking-Q8_0.gguf	mmproj-Qwen3VL-4B-Thinking-F16.gguf
Qwen-VL (GGUF)	Qwen3-VL-8B-Thinking-GGUF	Qwen/Qwen3-VL-8B-Thinking-GGUF	Qwen3VL-8B-Thinking-F16.gguf, Qwen3VL-8B-Thinking-Q4_K_M.gguf, Qwen3VL-8B-Thinking-Q8_0.gguf	mmproj-Qwen3VL-8B-Thinking-F16.gguf

📖 Usage

Basic Usage

Add the "QwenVL" node from the 🧪AILab/QwenVL category.
Select the model_name you wish to use.
Connect an image or video (image sequence) source to the node.
Write your prompt using the preset or custom field.
Run the workflow.

Advanced Usage

For more control, use the "QwenVL (Advanced)" node. This gives you access to detailed generation parameters like temperature, top_p, beam search, and device selection.

⚙️ Parameters

Parameter	Description	Default	Range	Node(s)
model_name	The Qwen-VL model to use.	Qwen3-VL-4B-Instruct	-	Standard & Advanced
quantization	On-the-fly quantization. Ignored for pre-quantized models (e.g., FP8).	8-bit (Balanced)	4-bit, 8-bit, None	Standard & Advanced
preset_prompt	A selection of pre-defined prompts for common tasks.	"Describe this..."	Any text	Standard & Advanced
custom_prompt	Overrides the preset prompt if provided.		Any text	Standard & Advanced
max_tokens	Maximum number of new tokens to generate.	1024	64-2048	Standard & Advanced
keep_model_loaded	Keep the model in VRAM for faster subsequent runs.	True	True/False	Standard & Advanced
seed	A seed for reproducible results.	1	1 - 2^64-1	Standard & Advanced
temperature	Controls randomness. Higher values = more creative. (Used when num_beams is 1).	0.6	0.1-1.0	Advanced Only
top_p	Nucleus sampling threshold. (Used when num_beams is 1).	0.9	0.0-1.0	Advanced Only
num_beams	Number of beams for beam search. > 1 disables temperature/top_p sampling.	1	1-10	Advanced Only
repetition_penalty	Discourages repeating tokens.	1.2	0.0-2.0	Advanced Only
frame_count	Number of frames to sample from the video input.	16	1-64	Advanced Only
device	Override automatic device selection.	auto	auto, cuda, cpu	Advanced Only
attention_mode	Attention backend for performance optimization.	auto	auto, flash_attention_2, sdpa	Standard & Advanced

💡 Quantization Options

Mode	Precision	Memory Usage	Speed	Quality	Recommended For
None (FP16)	16-bit Float	High	Fastest	Best	High VRAM GPUs (16GB+)
8-bit (Balanced)	8-bit Integer	Medium	Fast	Very Good	Balanced performance (8GB+)
4-bit (VRAM-friendly)	4-bit Integer	Low	Slower*	Good	Low VRAM GPUs (<8GB)

* Note on 4-bit Speed: 4-bit quantization significantly reduces VRAM usage but may result in slower performance on some systems due to the computational overhead of real-time dequantization.

⚡ Attention Mode Options

Mode	Description	Speed	Memory	Requirements
auto	Automatically selects Flash Attention 2 if available, falls back to SDPA	Fast	Medium	flash-attn package
flash_attention_2	Uses Flash Attention v2 for optimal performance	Fastest	Low	flash-attn + CUDA GPU
sdpa	PyTorch native Scaled Dot Product Attention	Medium	Medium	PyTorch 2.0+

Flash Attention 2 Requirements:

NVIDIA GPU with capability >= 8.6 (RTX 20/30/40/50 series)
CUDA >= 12.0
PyTorch >= 2.3.0
flash-attn package installed

🤔 Setting Tips

Setting	Recommendation
Model Choice	For most users, Qwen3-VL-4B-Instruct is a great starting point. If you have a 40-series GPU, try the -FP8 version for better performance.
Memory Mode	Keep keep_model_loaded enabled (True) for the best performance if you plan to run the node multiple times. Disable it only if you are running out of VRAM for other nodes.
Quantization	Start with the default 8-bit. If you have plenty of VRAM (>16GB), switch to None (FP16) for the best speed and quality. If you are low on VRAM, use 4-bit.
Performance	The first time a model is loaded with a specific quantization, it may be slow. Subsequent runs (with keep_model_loaded enabled) will be much faster.
Attention Mode	Use "flash_attention_2" for 2-3x speedup if you have compatible GPU. Otherwise use "auto" for automatic selection.

🧠 About Model

This node utilizes the Qwen-VL series of models, developed by the Qwen Team at Alibaba Cloud. These are powerful, open-source large vision-language models (LVLMs) designed to understand and process both visual and textual information, making them ideal for tasks like detailed image and video description.

⚡ Flash Attention 2 Performance Boost

This integration includes support for Flash Attention 2, a cutting-edge attention implementation that provides significant performance improvements:

🚀 Performance Gains

Model	Flash Attention 2	Speedup
Qwen2.5-VL-3B	100%	200-300%
Qwen3-VL-4B	100%	150-250%

🎯 How to Use

Install Flash Attention 2 (see Installation)
Select "flash_attention_2" in the attention_mode parameter
Run your workflow - the system automatically applies the optimization

🔧 Technical Details

Implementation: Uses optimized attention kernels for better memory efficiency
Compatibility: Works with all quantization modes (4-bit, 8-bit, FP16)
Integration: Seamlessly integrates with existing workflows
Fallback: Automatically falls back to SDPA if Flash Attention 2 is not available

📋 Requirements Checklist

flash-attn package installed
Sufficient VRAM for your chosen model
Compatible GPU (RTX 20 series or newer)

🐛 Troubleshooting

Flash Attention 2 not working?

# Check installation
python -c "import flash_attn; print('Flash Attention 2 available')"

# Check GPU capability
python -c "import torch; print(f'GPU capability: {torch.cuda.get_device_capability()}')"

Common Problems:

"Flash Attention 2 not available": Install the package and check GPU compatibility
"CUDA not available": Ensure you have installed PyTorch compatible CUDA
"GPU capability insufficient": Flash Attention 2 requires RTX 20 series or newer

📚 References

🎬 WAN 2.2 Integration

This enhanced version includes specialized prompts for WAN 2.2 video generation, supporting both I2V (image-to-video) and T2V (text-to-video) workflows.

🎯 Available WAN 2.2 Prompts

Prompt Type	Use Case	Input	Output	Location
🍿 Wan 2.2 I2V	Image-to-Video	Image + Text	5-second cinematic timeline	QwenVL nodes
🍿 Wan 2.2 T2V	Text-to-Video	Text only	5-second cinematic timeline	Prompt Enhancer nodes

⚡ Features

Cinematic Timeline Structure: 5-second videos with second-by-second descriptions
Multilingual Support: Italian/English input → English optimized output
Professional Scene Description: Film-style direction including lighting, camera, composition
NSFW Handling: Appropriate content filtering and description
WAN 2.2 Optimization: Specifically formatted for best video generation results

📝 Output Format Example

(At 0 seconds: A young woman stands facing a rack of clothes...)
(At 1 second: The blouse falls to the floor around her feet...)
(At 2 seconds: She reaches out with her right hand...)
(At 3 seconds: She turns her body slightly towards the mirror...)
(At 4 seconds: Lifting the hanger, she holds the dark fabric...)
(At 5 seconds: A subtle, thoughtful expression crosses her face...)

🔧 Usage

For I2V: Use "🍿 Wan 2.2 I2V" preset in QwenVL nodes with image input
For T2V: Use "🍿 Wan 2.2 T2V" style in Prompt Enhancer nodes with text only
For Storyboard: Use "🍿 Wan Extended Storyboard" for seamless scene continuity
For General Video: Use "🎥 Wan Cinematic Video" for professional single-scene descriptions

🎨 Best Practices

Provide clear, descriptive input for better scene interpretation
Use specific camera and lighting directions when possible
Include mood and atmosphere details for cinematic results
Leverage professional cinematography specs for optimal video quality
The system automatically handles timeline optimization for WAN 2.2 presets

🗺️ Roadmap

✅ Completed (v2.0.7)

✅ Support for Qwen3-VL and Qwen2.5-VL models.
✅ GGUF backend support for faster inference.
✅ Prompt Enhancer nodes for text-only workflows.
✅ Flash Attention 2 integration for 2-3x performance boost.
✅ WAN 2.2 I2V and T2V video generation prompts.
✅ Extended Storyboard preset for scene continuity.
✅ Professional cinematography specifications for all WAN 2.2 presets.
✅ WAN family branding and consistent naming.
✅ Extended Storyboard preset for seamless continuity generation.
✅ Free abliterated models without token requirements.
✅ Enhanced custom prompt logic across all nodes.
✅ Comprehensive NSFW content generation support.
✅ Optimized model ordering and quantization defaults.
✅ Clean attention pipeline with SDPA stability.
✅ Removed complexity for better model output reliability.
✅ Smart prompt caching system for performance optimization.
✅ Fixed Seed Mode for stable outputs regardless of media variations.
✅ Persistent cache across ComfyUI restarts.
✅ Code maintenance updates for future compatibility.

🙏 Credits

Qwen Team: Alibaba Cloud - For development and open-source powerful Qwen-VL models.
ComfyUI: comfyanonymous - For incredible and extensible ComfyUI platform.
llama-cpp-python: JamePeng/llama-cpp-python - GGUF backend with vision support used by GGUF nodes.
GenorTG: GenorTG/ComfyUI-Genor-QwenVL-Mod - For innovative memory management improvements including unload_after_run parameter and prompt cache optimizations that prevent OOM errors in multi-node workflows.
ComfyUI Integration: 1038lab - Developer of this custom node.

👥 Author

huchukato
- 🐙 GitHub
- 🐦 X (Twitter)
- 🎨 Civitai - Check out my AI art models!

📜 License

This repository code is released under GPL-3.0 License.

Name		Name	Last commit message	Last commit date
Latest commit History 635 Commits
__pycache__		__pycache__
docs		docs
example_workflows		example_workflows
nodes		nodes
runpod		runpod
vastai		vastai
web/js		web/js
.gitignore		.gitignore
AILab_OutputCleaner.py		AILab_OutputCleaner.py
AILab_QwenVL.py		AILab_QwenVL.py
AILab_QwenVL_GGUF.py		AILab_QwenVL_GGUF.py
AILab_QwenVL_GGUF_PromptEnhancer.py		AILab_QwenVL_GGUF_PromptEnhancer.py
AILab_QwenVL_PromptEnhancer.py		AILab_QwenVL_PromptEnhancer.py
AILab_System_Prompts.json		AILab_System_Prompts.json
LICENSE		LICENSE
README.md		README.md
README_it.md		README_it.md
__init__.py		__init__.py
custom_models_example.json		custom_models_example.json
custom_nodes.json		custom_nodes.json
gguf_models.json		gguf_models.json
gguf_requirements.txt		gguf_requirements.txt
hf_models.json		hf_models.json
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
sageattention_patch.py		sageattention_patch.py
update.md		update.md

Folders and files

Latest commit

History

Repository files navigation

QwenVL-Mod for ComfyUI

🚀 API Support & Troubleshooting

📋 Quick API Help

🔧 Common API Issues

🧪 Quick Debug

📞 Need More Help?

📰 News & Updates

🔥 Version 2.1.0 - Enhanced NSFW Support & Lighting Accuracy

✨ Features

🚀 Installation

Optional: Flash Attention 2 Installation

🧭 Node Overview

Transformers (HF) Nodes

GGUF (llama.cpp) Nodes

🧩 GGUF Nodes (llama.cpp backend)

🗂️ Config Files

📥 Download Models

HF Vision Models (Qwen-VL)

HF Text Models (Qwen3)

GGUF Models (Manual Download)

📖 Usage

Basic Usage

Advanced Usage

⚙️ Parameters

💡 Quantization Options

⚡ Attention Mode Options

🤔 Setting Tips

🧠 About Model

⚡ Flash Attention 2 Performance Boost

🚀 Performance Gains

🎯 How to Use

🔧 Technical Details

📋 Requirements Checklist

🐛 Troubleshooting

📚 References

🎬 WAN 2.2 Integration

🎯 Available WAN 2.2 Prompts

⚡ Features

📝 Output Format Example

🔧 Usage

🎨 Best Practices

🗺️ Roadmap

✅ Completed (v2.0.7)

🙏 Credits

👥 Author

📜 License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 0

Languages

Packages

Contributors