[ICLR 2026] General Policy Composition (GPC)

We propose General Policy Composition (GPC), a training-free method that enhances performance by combining the distributional scores of multiple pre-trained policies via a convex combination and test-time search. GPC is versatile, allowing for the plug-and-play composition of heterogeneous policies, including VA and VLA models, as well as those based on diffusion or flow-matching, irrespective of their input visual modalities.

📢 News

Jan 2026: 🎉 GPC is accepted to ICLR 2026!

Note: This repository as well as the following guidelines are based on RoboTwin-1.0, thanks for the authors' open resoure which greatly contributes to the community.

💻 Installation & 📚 Data Preparation

Please carefully follow the guidelines in RoboTwin-1.0 for installation and data generation.

🧑🏻‍💻 Usage

Step 1. Prepare the pre-trained single modality-based diffusion policies

1. Task Running and Data Collection

Running the following command will first search for a random seed for the target collection quantity (default is 100), and then replay the seed to collect data.

bash run_task.sh ${task_name} ${gpu_id}

2. Training the diffusion policy as well as the 3D diffusion policy

(1) Diffusion Policy

The DP code can be found in policy/Diffusion-Policy.

Process Data for DP training:

python script/pkl2zarr_dp.py ${task_name} ${head_camera_type} ${expert_data_num}
# As example: python script/pkl2zarr_dp.py dual_bottles_pick_hard L515 100, which indicates preprocessing of 100 dual_bottles_pick_hard task trajectory data using L515 camera.

Then, move to policy/Diffusion-Policy first, and run the following code to train DP:

bash train.sh ${task_name} ${head_camera_type} ${expert_data_num} ${seed} ${gpu_id}
# As example: bash train.sh dual_bottles_pick_hard L515 100 0 0

(2) 3D Diffusion Policy

The DP3 code can be found in policy/3D-Diffusion-Policy.

Process Data for DP3 training:

python script/pkl2zarr_dp3.py ${task_name} ${head_camera_type} ${expert_data_num}
# As example: python script/pkl2zarr_dp3.py dual_bottles_pick_hard L515 100

Then, move to policy/3D-Diffusion-Policy first, and run the following code to train DP3:

bash train_ddpm.sh ${task_name} ${head_camera_type} ${expert_data_num} ${seed} ${gpu_id}
# As example: bash train_ddpm.sh dual_bottles_pick_hard L515 100 0 0

Note: One key difference between our training settings and RoboTwin's default configuration is that we need to align the action distribution space. This means that the noise scheduler for both DP and DP3 needs to be consistent, including aspects such as prediction type and inference time steps. To achieve this, we align DP3's noise scheduler with DP's default DDPM scheduler (in principle, as long as both schedulers are consistent, this is not limited to DDPM). This ensures that the action spaces of both policies remain aligned during inference sampling.

Step 2. Compose the pre-trained policies via distribution-level composition

Move to policy/3D-Diffusion-Policy first, then run the following code to evaluate MCDP for a specific task for 100 times:

bash eval_composed.sh ${task_name} ${head_camera_type} ${expert_data_num} ${checkpoint_num} ${seed} ${gpu_id} ${dp_w} ${dp3_w}
# As example: # bash eval_composed.sh dual_bottles_pick_hard L515 100 3000 0 0 0.3 0.7

🏄‍♂️ Experiment

Simulation Results

For each task, we evaluated the baseline performance under L515 (320×180 resolution, FOV 45°) camera. The DP_img and DP_pcd correspond to the diffusion policy based on RGB images (i.e.,, DP) and point cloud (i.e.,, DP3), respectively. We reproduced the DP and DP3 (without using point cloud color) with random seed 0.

🔎 Findings

When both DPs have moderate accuracy (both 30%), MCDP often achieves higher accuracy under appropriate weight configurations compared to unimodal DPs.
When one DP has significantly lower accuracy, MCDP struggles to surpass the highest accuracy of the better-performing unimodal DP.
Building on Finding 1, the improvement of MCDP is maximized when the better-performing unimodal DP holds a larger weight in MCDP.

😺 Acknowledgement

Our code is generally built upon: RoboTwin, 3D Diffusion Poliy, Diffusion Policy. We thank all these authors for their nicely open sourced code and their great contributions to the community.

👍 Citation

@article{cao2025gpc,
      title={Compose Your Policies! Improving Diffusion-based or Flow-based Robot Policies via Test-time Distribution-level Composition}, 
      author={Jiahang Cao and Yize Huang and Hanzhong Guo and Rui Zhang and Mu Nan and Weijian Mai and Jiaxu Wang and Hao Cheng and Jingkai Sun and Gang Han and Wen Zhao and Qiang Zhang and Yijie Guo and Qihao Zheng and Chunfeng Song and Xiao Li and Ping Luo and Andrew F. Luo},
      journal={arXiv preprint arXiv:2510.01068},
      year={2025},
      url={https://arxiv.org/abs/2510.01068}, 
}

🏷️ License

This repository is released under the MIT license. See LICENSE for additional details.

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
envs		envs
figures		figures
files		files
policy		policy
script		script
.gitignore		.gitignore
CONFIG_TUTORIAL.md		CONFIG_TUTORIAL.md
INSTALLATION.md		INSTALLATION.md
LICENSE		LICENSE
README.md		README.md
exp.png		exp.png
pipeline.png		pipeline.png
run_task.sh		run_task.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

[ICLR 2026] General Policy Composition (GPC)

📢 News

💻 Installation & 📚 Data Preparation

🧑🏻‍💻 Usage

Step 1. Prepare the pre-trained single modality-based diffusion policies

1. Task Running and Data Collection

2. Training the diffusion policy as well as the 3D diffusion policy

(1) Diffusion Policy

(2) 3D Diffusion Policy

Step 2. Compose the pre-trained policies via distribution-level composition

🏄‍♂️ Experiment

Simulation Results

🔎 Findings

😺 Acknowledgement

👍 Citation

🏷️ License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

License

SageCao1125/GPC

Folders and files

Latest commit

History

Repository files navigation

[ICLR 2026] General Policy Composition (GPC)

📢 News

💻 Installation & 📚 Data Preparation

🧑🏻‍💻 Usage

Step 1. Prepare the pre-trained single modality-based diffusion policies

1. Task Running and Data Collection

2. Training the diffusion policy as well as the 3D diffusion policy

(1) Diffusion Policy

(2) 3D Diffusion Policy

Step 2. Compose the pre-trained policies via distribution-level composition

🏄‍♂️ Experiment

Simulation Results

🔎 Findings

😺 Acknowledgement

👍 Citation

🏷️ License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages