Zehan Wang1* · Ziang Zhang1* · Jiayang Xu1 · Jialei Wang1 · Tianyu Pang2 · Chao Du2 · Hengshuang Zhao3 · Zhou Zhao1
1Zhejiang University 2SEA AI Lab 3HKU
*Equal Contribution
Orient Anything V2, a unified spatial vision model for understanding orientation, symmetry, and relative rotation, achieves SOTA performance across 14 datasets.
-
2025-12-12: 🔥Paper, Project Page, Code, Training Data, Model Checkpoint, and Demo have been released!
-
2025-09-18: 🔥Orient Anything V2 has been accepted as a Spotlight @ NeurIPS 2025!
We provide pre-trained model weights and are continuously iterating on them to support more inference scenarios:
| Model | Params | Checkpoint |
|---|---|---|
| Orient-Anything-V2 | 5.05 GB | Download |
conda create -n orianyv2 python=3.11
conda activate orianyv2
pip install -r requirements.txtStart gradio by executing the following script:
python app.pythen open GUI page(default is https://127.0.0.1:7860) in web browser.
or, you can try it in our Huggingface-Space
import numpy as np
from PIL import Image
import torch
import tempfile
import os
from paths import *
from vision_tower import VGGT_OriAny_Ref
from inference import *
from app_utils import *
mark_dtype = torch.bfloat16 if torch.cuda.get_device_capability()[0] >= 8 else torch.float16
# device = 'cuda:0'
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
if os.path.exists(LOCAL_CKPT_PATH):
ckpt_path = LOCAL_CKPT_PATH
else:
from huggingface_hub import hf_hub_download
ckpt_path = hf_hub_download(repo_id="Viglong/Orient-Anything-V2", filename=HF_CKPT_PATH, repo_type="model", cache_dir='./', resume_download=True)
model = VGGT_OriAny_Ref(out_dim=900, dtype=mark_dtype, nopretrain=True)
model.load_state_dict(torch.load(ckpt_path, map_location='cpu'))
model.eval()
model = model.to(device)
print('Model loaded.')
@torch.no_grad()
def run_inference(pil_ref, pil_tgt=None, do_rm_bkg=True):
if pil_tgt is not None:
if do_rm_bkg:
pil_ref = background_preprocess(pil_ref, True)
pil_tgt = background_preprocess(pil_tgt, True)
else:
if do_rm_bkg:
pil_ref = background_preprocess(pil_ref, True)
try:
ans_dict = inf_single_case(model, pil_ref, pil_tgt)
except Exception as e:
print("Inference error:", e)
raise gr.Error(f"Inference failed: {str(e)}")
def safe_float(val, default=0.0):
try:
return float(val)
except:
return float(default)
az = safe_float(ans_dict.get('ref_az_pred', 0))
el = safe_float(ans_dict.get('ref_el_pred', 0))
ro = safe_float(ans_dict.get('ref_ro_pred', 0))
alpha = int(ans_dict.get('ref_alpha_pred', 1))
if pil_tgt is not None:
rel_az = safe_float(ans_dict.get('rel_az_pred', 0))
rel_el = safe_float(ans_dict.get('rel_el_pred', 0))
rel_ro = safe_float(ans_dict.get('rel_ro_pred', 0))
print("Relative Pose: Azi",rel_az,"Ele",rel_el,"Rot",rel_ro)
image_ref_path = 'assets/examples/F35-0.jpg'
image_tgt_path = 'assets/examples/F35-1.jpg' # optional
image_ref = Image.open(image_ref_path).convert('RGB')
image_tgt = Image.open(image_tgt_path).convert('RGB')
run_inference(image_ref, image_tgt, True)Download the absolute orientation, relative rotation, and symm-orientation test datasets from Huggingface Dataset.
# set mirror endpoint to accelerate
# export HF_ENDPOINT='https://hf-mirror.com'
huggingface-cli download --repo-type dataset Viglong/OriAnyV2_Inference --local-dir OriAnyV2_InferenceUse the following command to extract the dataset:
cd OriAnyV2_Inference
for f in *.tar.gz; do
tar -xzf "$f"
doneModify DATA_ROOT in paths.py to point to the dataset root directory(/path/to/OriAnyV2_Inference).
To evaluate on test datasets, run the following code:
python eval_on_dataset.pyWe use FLUX.1-dev and Hunyuan3D-2.0 to generate our training data and render it with Blender. We provide the fully rendered data, which you can obtain from the link below.
| Assets | Disk Space | Download Link |
|---|---|---|
| Images and 3D assets in the data pipeline | 2 TB | Hunyuan3D-FLUX-Gen |
| Final Rendering Data | 25 GB | Training Dataset |
To store all this data, we recommend having at least 2TB of free disk space on your server.
We are currently organizing the complete data construction pipeline and training code for Orient-Anything-V2 — stay tuned.
We would like to express our sincere gratitude to the following excellent works:
If you find this project useful, please consider citing:
@inproceedings{wangorient,
title={Orient Anything V2: Unifying Orientation and Rotation Understanding},
author={Wang, Zehan and Zhang, Ziang and Xu, Jiayang and Wang, Jialei and Pang, Tianyu and Du, Chao and Zhao, Hengshuang and Zhao, Zhou},
booktitle={The Thirty-ninth Annual Conference on Neural Information Processing Systems}
}