Skip to content

JCZ404/Awesome-Visual-Autoregressive

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

45 Commits
 
 
 
 

Repository files navigation

Awesome Visual Autoregressive

This is a curated list of recent visual autoregressive modeling works, including image/video/3D/multi-modal generation but not limited to these. It aims to include all the relevant latest papers about visual autoregressive to save you time. Any suggestions and pull requests are welcomed!

Survey

  • Next Token Prediction Towards Multimodal Intelligence: A Comprehensive Survey arXiv Star

  • Towards Unifying Understanding and Generation in the Era of Vision Foundation Models: A Survey from the Autoregression Perspective arXiv

Image Generation

  • MaskGIT: Masked Generative Image Transformer arXiv Star

  • MAGVIT: Masked Generative Video Transformer arXiv Star

  • RQ-VAE:Autoregressive Image Generation using Residual Quantization arXiv Star

  • Towards Accurate Image Coding: Improved Autoregressive Image Generation with Dynamic Vector Quantization arXiv Star

  • MAGVIT-v2:Language Model Beats Diffusion: Tokenizer is key to visual generation arXiv Star

  • LlamaGen: Autoregressive Model Beats Diffusion: Llama for Scalable Image Generation arXiv Star

  • VAR: Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale Prediction arXiv Star

  • MAR: Autoregressive Image Generation without Vector Quantization arXiv Star

  • SAR: Customize Your Visual Autoregressive Recipe with Set Autoregressive Modeling arXiv Star

  • STAR: Scale-wise Text-to-image generation via Auto-Regressive representations arXiv Website

  • Fluid: Scaling Autoregressive Text-to-image Generative Models with Continuous Tokens arXiv

  • Infinity: Scaling Bitwise AutoRegressive Modeling for High-Resolution Image Synthesis arXiv Star

  • Lumina-mGPT: Illuminate Flexible Photorealistic Text-to-Image Generation with Multimodal Generative Pretraining arXiv Star

  • Taming Scalable Visual Tokenizer for Autoregressive Image Generation arXiv Star

  • Open-MAGVIT2: An Open-Source Project Toward Democratizing Auto-regressive Visual Generation arXiv Star

  • DARL: Denoising Autoregressive Representation Learning arXiv

  • TiTok: An Image is Worth 32 Tokens for Reconstruction and Generation arXiv Star Website

  • XQ-GAN: An Open-source Image Tokenization Framework for Autoregressive Generation arXiv Star

  • ImageFolder: Autoregressive Image Generation with Folded Tokens arXiv Star

  • DART: Denoising Autoregressive Transformer for Scalable Text-to-Image Generation arXiv

  • M-VAR: Decoupled Scale-wise Autoregressive Modeling for High-Quality Image Generation arXiv Star

  • CoDe: Collaborative Decoding Makes Visual Auto-Regressive Modeling Efficient arXiv Star

  • VAR-CLIP: Text-to-Image Generator with Visual Auto-Regressive Modeling arXiv Star

  • RandAR: Decoder-only Autoregressive Visual Generation in Random Orders arXiv Star

  • RAR: Randomized Autoregressive Visual Generation arXiv Star

  • MARS: Mixture of Auto-Regressive Models for Fine-grained Text-to-image Synthesis arXiv Star

  • FlowAR: Scale-wise Autoregressive Image Generation Meets Flow Matching arXiv Star

  • CAR: Controllable Autoregressive Modeling for Visual Generation arXiv Star

  • CCA: Toward Guidance-Free AR Visual Generation via Condition Contrastive Alignment arXiv Star

  • Scalable Autoregressive Image Generation with Mamba arXiv Star

  • ControlVAR: Exploring Controllable Visual Autoregressive Modeling arXiv Star

  • DnD-Transformer: A Spark of Vision-Language Intelligence: 2-Dimensional Autoregressive Transformer for Efficient Fine-grained Image Generation arXiv Star

  • EditAR: Unified Conditional Generation with Autoregressive Models arXiv Website

  • LiteVAR: Compressing Visual Autoregressive Modelling with Efficient Attention and Quantization arXiv

  • XTRA: Sample- and Parameter-Efficient Auto-Regressive Image Models arXiv Star

  • Switti: Designing Scale-Wise Transformers for Text-to-Image Synthesis arXiv Star

  • ZipAR: Accelerating Auto-regressive Image Generation through Spatial Locality arXiv Star

  • E-CAR: Efficient Continuous Autoregressive Image Generation via Multistage Modeling arXiv

  • PAR: Parallelized Autoregressive Visual Generation arXiv Star

  • NPP: Next Patch Prediction for Autoregressive Visual Generation arXiv Star

  • Distilled Decoding 1: One-step Sampling of Image Auto-regressive Models with Flow Matching arXiv Website

  • X-Prompt: Towards Universal In-Context Image Generation in Auto-Regressive Vision Language Foundation Models arXiv Star

  • StarGen: A Spatiotemporal Autoregression Framework with Video Diffusion Model for Scalable and Controllable Scene Generation arXiv

  • IAR: Improving Autoregressive Visual Generation with Cluster-Oriented Token Prediction arXiv Star

  • ViTok: Learnings from Scaling Visual Tokenizers for Reconstruction and Generation arXiv

  • FlexTok: Resampling Images into 1D Token Sequences of Flexible Length arXiv Website

  • Fractal Generative Models arXiv Star

  • IGTR: Autoregressive Image Generation Guided by Chains of Thought arXiv

  • Beyond Next-Token: Next-X Prediction for Autoregressive Visual Generation arXiv Website

  • UniTok: A Unified Tokenizer for Visual Generation and Understanding arXiv Star

  • FlexVAR: Flexible Visual Autoregressive Modeling without Residual Prediction arXiv Star

  • FAR: Frequency Autoregressive Image Generation with Continuous Tokens arXiv Star

  • DAR: Direction-Aware Diagonal Autoregressive Image Generation arXiv

Video Generation

  • Divot: Diffusion Powers Video Tokenizer for Comprehension and Generation arXiv Star

  • DiCoDe: Diffusion-Compressed Deep Tokens for Autoregressive Video Generation with Language Models arXiv Website

  • NOVA: Autoregressive Video Generation without Vector Quantization arXiv Star

  • CausVid: From Slow Bidirectional to Fast Autoregressive Video Diffusion Models arXiv Website

  • An Empirical Study of Autoregressive Pre-training from Videos arXiv Website

  • CTF: Taming Teacher Forcing for Masked Autoregressive Video Generation arXiv Website

  • Next Block Prediction: Video Generation via Semi-Autoregressive Modeling arXiv Star Website

3D Generation

  • SAR3D: Autoregressive 3D Object Generation and Understanding via Multi-scale 3D VQVAE arXiv Star

  • TAR3D: Creating High-quality 3D Assets via Next-Part Prediction arXiv Star

  • ARTalk: Speech-Driven 3D Head Animation via Autoregressive Model arXiv Website

Multi-Modal

  • Transfusion: Predict the Next Token and Diffuse Images with One Multi-Modal Model arXiv Star

  • Show-o: One Single Transformer To Unify Multimodal Understanding and Generation arXiv Star

  • SEED-X: Multimodal Models with Unified Multi-granularity Comprehension and Generation arXiv Star

  • DreamLLM: Synergistic Multimodal Comprehension and Creation arXiv Star

  • (LlamaFusion)LMFusion: Adapting Pretrained Language Models for Multimodal Generation arXiv

  • MetaMorph: Multimodal Understanding and Generation via Instruction Tuning arXiv Website

  • Chameleon: Mixed-Modal Early-Fusion Foundation Models arXiv Star

  • Emu3: Next-Token Prediction is All You Need arXiv Star

  • Liquid: Language Models are Scalable Multi-modal Generators arXiv Star

  • Janus: Decoupling Visual Encoding for Unified Multimodal Understanding and Generation arXiv Star

  • TokenFlow: Unified Image Tokenizer for Multimodal Understanding and Generation arXiv Star

  • JetFormer: An Autoregressive Generative Model of Raw Images and Text arXiv

  • VILA-U: a Unified Foundation Model Integrating Visual Understanding and Generation arXiv Star

  • MMAR: Towards Lossless Multi-Modal Auto-Regressive Probabilistic Modeling arXiv

  • Dual Diffusion for Unified Image Generation and Understanding arXiv Website

  • VARGPT: Unified Understanding and Generation in a Visual Autoregressive Multimodal Large Language Model arXiv Website

  • QLIP: Text-Aligned Visual Tokenization Unifies Auto-Regressive Multimodal Understanding and Generation arXiv Website

Autonomous Driving

  • DrivingWorld: Constructing World Model for Autonomous Driving via Video GPT arXiv Star

  • DrivingGPT: Unifying Driving World Modeling and Planning with Multi-modal Autoregressive Transformers arXiv Website

About

Curated list of recent visual autoregressive (VAR) modeling works

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors