IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2025 🌟
Mamba Policy, a lighter yet stronger policy method based on a hybrid state space model integrated with attention mechanisms. Our extensive experiments demonstrate that Mamba Policy achieves up to a 5% improvement in success rate under a variety of manipulation datasets, while reducing the parameter count by 80%.
Note: This repository as well as the following guidelines are based on 3D Diffusion Policy, thanks for the authors' open resoure which greatly contributes to the community.
Please carefully follow the guidelines in 3D Diffusion Policy for installation and data generation.
- See INSTALL.md for installation instructions: the main setup from DP3.
- [Option]
pip install causal-conv1d>=1.4.0: an efficient implementation of a simple causal Conv1d layer used inside the Mamba block. pip install mamba-ssm: the core Mamba package.
You could generate demonstrations by yourself using our provided expert policies. Generated demonstrations are under $YOUR_REPO_PATH/3D-Diffusion-Policy/data/.
- Download Adroit RL experts from OneDrive, unzip it, and put the
ckptsfolder under$YOUR_REPO_PATH/third_party/VRL3/. - Download DexArt assets from Google Drive and put the
assetsfolder under$YOUR_REPO_PATH/third_party/dexart-release/.
Note: since you are generating demonstrations by yourselves, the results could be slightly different from the results reported in the paper. This is normal since the results of imitation learning highly depend on the demonstration quality. Please re-generate demonstrations if you encounter some bad demonstrations and no need to open a new issue.
Scripts for generating demonstrations, training, and evaluation are all provided in the scripts/ folder.
The results are logged by wandb, so you need to wandb login or export your wandb key at YOUR_REPO_PATH/Mamba-Policy/3D-Diffusion-Policy/train.py:
import os
os.environ['WANDB_API_KEY'] = YOUR_WANDB_KEYFor more detailed arguments, please refer to the scripts and the code. We here provide a simple instruction for using the codebase.
-
Generate demonstrations by
gen_demonstration_adroit.shandgen_demonstration_dexart.sh. See the scripts for details. For example:bash scripts/gen_demonstration_adroit.sh hammer
This will generate demonstrations for the
hammertask in Adroit environment. The data will be saved in3D-Diffusion-Policy/data/folder automatically.We conducted experiments on
Adroit (Hammer, Door, Pen),DexArt (Laptop Faucet Toilet Bucket), andMetaWorld (Assembly, Disassemble, Stick-Push)in our paper. -
Train and evaluate a policy with behavior cloning. For example:
bash scripts/train_policy.sh dp3_mamba adroit_hammer 1125 0 0
This will train a Mamba Policy with mamba-v1 on the
hammertask in Adroit environment using point cloud modality.bash bash scripts/train_policy_multi.sh dp3_mamba_hydra metaworld_stick-pull 1125 0
Or you can train models with 3 seeds (default seeds: [0,1,2]).
Note: the eval.sh is only provided for deployment/inference. For benchmarking, please use the results logged in wandb during training.
For Mamba Policy with more state-space model (SSM) variants, we also provide their codes for future exploration:
- Mamba-V1:
dp3_mamba.yaml, which is based on the origin Mamba. - Mamba-V2:
dp3_mamba_v2.yaml, where Mamba2 is adopted. - Mamba-Bidirectional:
dp3_mamba_bi.yaml, where the bidirectional mamba module is introduced in Vision Mamba. - Mamba-Hydra:
dp3_mamba_hydra.yaml, a quasiseparable matrix mixer-based bidirectional SSM (Hydra).
Note: Since Vision Mamba (Vim) modified the core codes of mamba-ssm, when using dp3_mamba_bi.yaml, you should first uninstall the mamba package through pip uninstall mamba-ssm,
then install the vim-based mamba from github source:
git clone https://github.com/hustvl/Vim.git
cd Vim
pip install -e causal_conv1d>=1.1.0
pip install -e mamba-1p1p1Despite Mamba policy having a significantly smaller parameter count (~80% fewer) compared to DP3, its training speed is not consistently faster. This phenomenon is primarily attributed to Mamba's architectural design, which introduces a substantial constant overhead. This overhead becomes negligible only when the sequence length exceeds a certain threshold.
As described in the paper, Mamba outperforms CNNs in speed when the sequence length > ~8k, and surpasses Transformers (with Flash Attention) when the sequence length > ~2k. For further details, refer to Figure 8 in the Mamba paper.
Related issues and explanations:
Variants of Mamba, such as Mamba2 and Hydra, exhibit longer initial load times and extended training durations. This is largely due to the fact that Mamba2 is predominantly implemented in Triton, resulting in significant CPU overhead, especially for smaller layers. Two potential solutions to mitigate this issue include: (1) leveraging CUDA graphs or Torch compilation, and (2) scaling up the model size.
Related issues and explanations:
Therefore, scaling up the Mamba model size (e.g., by extending the sequence length) is a promising approach to reduce overhead and improve efficiency, making it a viable direction for future research.
Our code is generally built upon: 3D Diffusion Poliy, Diffusion Policy, Mamba, Vision Mamba, Hydra. We thank all these authors for their nicely open sourced code and their great contributions to the community.
For any help or issues of this project, please contact Jiahang Cao.
If you find our work useful, please consider citing:
@article{cao2024mamba,
title={Mamba Policy: Towards Efficient 3D Diffusion Policy with Hybrid Selective State Models},
author={Cao, Jiahang and Zhang, Qiang and Sun, Jingkai and Wang, Jiaxu and Cheng, Hao and Li, Yulin and Ma, Jun and Shao, Yecheng and Zhao, Wen and Han, Gang and others},
journal={arXiv preprint arXiv:2409.07163},
year={2024}
}
