CausalEmbed: Auto-Regressive Multi-Vector Generation in Latent Space for Visual Document Embedding

Jiahao Huo^1,2,3 Yu Huang^1,2 Yibo Yan^1,2,4 Ye Pan¹
Yi Cao² Mingdong Ou^♣️,2 Philip S. Yu³ Xuming Hu^✉,1,4

¹The Hong Kong University of Science and Technology (Guangzhou) ²Alibaba Cloud Computing
³University of Illinois Chicago ⁴The Hong Kong University of Science and Technology
^♣️ Project Leader ^✉ Corresponding Author

Official implementation of "CAUSALEMBED: Auto-Regressive Multi-Vector Generation in Latent Space for Visual Document Embedding".
This repository is built upon the official implementation of ColPali. Thanks a lot for their efforts!

Updates

29 Jan, 2026 : Paper published in Arxiv.
10 Feb, 2025 : Code and models published.

This repository contains the official implementation of the following paper:

CausalEmbed: Auto-Regressive Multi-Vector Generation in Latent Space for Visual Document Embedding https://arxiv.org/abs/2601.21262

Abstract: Although Multimodal Large Language Models (MLLMs) have shown remarkable potential in Visual Document Retrieval (VDR) through generating high-quality multi-vector embeddings, the substantial storage overhead caused by representing a page with thousands of visual tokens limits their practicality in real-world applications. To address this challenge, we propose an auto-regressive generation approach, CausalEmbed, for constructing multi-vector embeddings. By incorporating iterative margin loss during contrastive training, CausalEmbed encourages the embedding models to learn compact and well-structured representations. Our method enables efficient VDR tasks using only dozens of visual tokens, achieving a 30-155x reduction in token count while maintaining highly competitive performance across various backbones and benchmarks. Theoretical analysis and empirical results demonstrate the unique advantages of auto-regressive embedding generation in terms of training efficiency and scalability at test time. As a result, CausalEmbed introduces a flexible test-time scaling strategy for multi-vector VDR representations and sheds light on the generative paradigm within multimodal document retrieval.

List of CausalEmbed models

Model	Score on ViDoRe 🏆	License	Comments	Currently supported
Z1zs/CausalQwen2.5	81.1	Qwen2.5-VL	• Based on `Qwen/Qwen2.5-VL-3B-Instruct`. • Checkpoint used in the CausalEmbed paper.	✅
Z1zs/CausalPali	75.0	Gemma	• Based on `google/paligemma-3b-mix-448`. • Fix right padding for queries.	✅

Environment Installation

# Create and activate a Conda environment
conda create -n causal python=3.12
conda activate causal

# Install dependencies
pip install -r requirements.txt

# Install FlashAttention for faster attention computation
pip install flash-attn

# Install ColPali engine in editable mode
python -m pip install -e .

Training

We use accelerate for multi-GPU training. Example commands::

accelerate launch --multi-gpu train_causalqwen25.py \
  --epoch 1 \
  --dtoken_num 32 \
  --qtoken_num 16 \
  --loss symmargin \
  --wm 1 \
  --wp 0.1 \
  --wn 0.1 \
  --wq 0.1 \
  --bs 8

accelerate launch --multi-gpu train_causalqwen25.py \
  --epoch 1 \
  --dtoken_num 32 \
  --qtoken_num 16 \
  --loss symmargin \
  --wm 1 \
  --wp 0.1 \
  --wn 0.1 \
  --wq 0.1 \
  --bs 3

Key Arguments

--loss: Loss function to use. Options: ["ce", "pairwise", "symmargin"]

Recommendation: Use pairwise (without regularization) or symmargin (with regularization).

--ckpt: Path to the renamed model weights for resuming training (run rename.py first to ensure weight keys match the expected format).

--wm/wp/wn/wq: Weights for the main loss, positive regularization, negative regularization, and query regularization, respectively.

Weight Renaming

To ensure checkpoints can be loaded properly when resuming training, rename the .safetensors files using the provided scripts:

In rename_xxx.py, set:

checkpoint_path: the source checkpoint directory

output_path: the target directory for renamed weights

Run the renaming scripts:

python rename_qwen.py
python rename_pali.py

Evaluation

Evaluate the trained model using:

bash eval.sh

It will evaluate the preformance of CausalEmbed model on each subset of ViDoRe.

Contributing

We welcome contributions to CausalEmbed! 🤗
Please submit your PR on new models/results/functions freely, and we will review it as soon as possible.

Citation

CausalEmbed: Auto-Regressive Multi-Vector Generation in Latent Space for Visual Document Embedding

Authors: Jiahao Huo, Yu Huang, Yibo Yan, Ye Pan, Yi Cao, Mingdong Ou, Philip S. Yu, Xuming Hu

@misc{huo2026causalembed,
      title={CausalEmbed: Auto-Regressive Multi-Vector Generation in Latent Space for Visual Document Embedding}, 
      author={Jiahao Huo and Yu Huang and Yibo Yan and Ye Pan and Yi Cao and Mingdong Ou and Philip S. Yu and Xuming Hu},
      year={2026},
      eprint={2601.21262},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2601.21262}, 
}

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
colpali_engine		colpali_engine
results		results
tests		tests
README.md		README.md
eval.py		eval.py
eval.sh		eval.sh
pyproject.toml		pyproject.toml
rename_pali.py		rename_pali.py
rename_qwen.py		rename_qwen.py
requirements.txt		requirements.txt
train_ causalqwen25.py		train_ causalqwen25.py
train_causalpali.py		train_causalpali.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CausalEmbed: Auto-Regressive Multi-Vector Generation in Latent Space for Visual Document Embedding

Updates

List of CausalEmbed models

Environment Installation

Training

Key Arguments

Weight Renaming

Evaluation

Contributing

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

CausalEmbed: Auto-Regressive Multi-Vector Generation in Latent Space for Visual Document Embedding

Updates

List of CausalEmbed models

Environment Installation

Training

Key Arguments

Weight Renaming

Evaluation

Contributing

Citation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages