Mamba Policy

IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2025 🌟

Mamba Policy, a lighter yet stronger policy method based on a hybrid state space model integrated with attention mechanisms. Our extensive experiments demonstrate that Mamba Policy achieves up to a 5% improvement in success rate under a variety of manipulation datasets, while reducing the parameter count by 80%.

Note: This repository as well as the following guidelines are based on 3D Diffusion Policy, thanks for the authors' open resoure which greatly contributes to the community.

💻 Installation & 📚 Data Preparation

Please carefully follow the guidelines in 3D Diffusion Policy for installation and data generation.

Step 1: Environment Setup

See INSTALL.md for installation instructions: the main setup from DP3.
[Option] pip install causal-conv1d>=1.4.0: an efficient implementation of a simple causal Conv1d layer used inside the Mamba block.
pip install mamba-ssm: the core Mamba package.

Step 2: Data Generation

You could generate demonstrations by yourself using our provided expert policies. Generated demonstrations are under $YOUR_REPO_PATH/3D-Diffusion-Policy/data/.

Download Adroit RL experts from OneDrive, unzip it, and put the ckpts folder under $YOUR_REPO_PATH/third_party/VRL3/.
Download DexArt assets from Google Drive and put the assets folder under $YOUR_REPO_PATH/third_party/dexart-release/.

Note: since you are generating demonstrations by yourselves, the results could be slightly different from the results reported in the paper. This is normal since the results of imitation learning highly depend on the demonstration quality. Please re-generate demonstrations if you encounter some bad demonstrations and no need to open a new issue.

🛠️ Simulation Usage

Scripts for generating demonstrations, training, and evaluation are all provided in the scripts/ folder.

The results are logged by wandb, so you need to wandb login or export your wandb key at YOUR_REPO_PATH/Mamba-Policy/3D-Diffusion-Policy/train.py:

    import os
    os.environ['WANDB_API_KEY'] = YOUR_WANDB_KEY

For more detailed arguments, please refer to the scripts and the code. We here provide a simple instruction for using the codebase.

Generate demonstrations by gen_demonstration_adroit.sh and gen_demonstration_dexart.sh. See the scripts for details. For example:
```
bash scripts/gen_demonstration_adroit.sh hammer
```
This will generate demonstrations for the hammer task in Adroit environment. The data will be saved in 3D-Diffusion-Policy/data/ folder automatically.

We conducted experiments on Adroit (Hammer, Door, Pen), DexArt (Laptop Faucet Toilet Bucket), and MetaWorld (Assembly, Disassemble, Stick-Push) in our paper.
Train and evaluate a policy with behavior cloning. For example:
```
bash scripts/train_policy.sh dp3_mamba adroit_hammer 1125 0 0
```
This will train a Mamba Policy with mamba-v1 on the hammer task in Adroit environment using point cloud modality.
```
bash bash scripts/train_policy_multi.sh dp3_mamba_hydra metaworld_stick-pull 1125 0
```
Or you can train models with 3 seeds (default seeds: [0,1,2]).

Note: the eval.sh is only provided for deployment/inference. For benchmarking, please use the results logged in wandb during training.

For Mamba Policy with more state-space model (SSM) variants, we also provide their codes for future exploration:

Mamba-V1: dp3_mamba.yaml, which is based on the origin Mamba.
Mamba-V2: dp3_mamba_v2.yaml, where Mamba2 is adopted.
Mamba-Bidirectional: dp3_mamba_bi.yaml, where the bidirectional mamba module is introduced in Vision Mamba.
Mamba-Hydra: dp3_mamba_hydra.yaml, a quasiseparable matrix mixer-based bidirectional SSM (Hydra).

Note: Since Vision Mamba (Vim) modified the core codes of mamba-ssm, when using dp3_mamba_bi.yaml, you should first uninstall the mamba package through pip uninstall mamba-ssm, then install the vim-based mamba from github source:

git clone https://github.com/hustvl/Vim.git
cd Vim
pip install -e causal_conv1d>=1.1.0
pip install -e mamba-1p1p1

😎 Key Observations in Training

Training Speed Discrepancy Despite Fewer Parameters

Despite Mamba policy having a significantly smaller parameter count (~80% fewer) compared to DP3, its training speed is not consistently faster. This phenomenon is primarily attributed to Mamba's architectural design, which introduces a substantial constant overhead. This overhead becomes negligible only when the sequence length exceeds a certain threshold.

As described in the paper, Mamba outperforms CNNs in speed when the sequence length > ~8k, and surpasses Transformers (with Flash Attention) when the sequence length > ~2k. For further details, refer to Figure 8 in the Mamba paper.

Related issues and explanations:

state-spaces/mamba#156

Increased Initial Load and Training Time for Mamba Variants

Variants of Mamba, such as Mamba2 and Hydra, exhibit longer initial load times and extended training durations. This is largely due to the fact that Mamba2 is predominantly implemented in Triton, resulting in significant CPU overhead, especially for smaller layers. Two potential solutions to mitigate this issue include: (1) leveraging CUDA graphs or Torch compilation, and (2) scaling up the model size.

Related issues and explanations:

Therefore, scaling up the Mamba model size (e.g., by extending the sequence length) is a promising approach to reduce overhead and improve efficiency, making it a viable direction for future research.

😺 Acknowledgement

Our code is generally built upon: 3D Diffusion Poliy, Diffusion Policy, Mamba, Vision Mamba, Hydra. We thank all these authors for their nicely open sourced code and their great contributions to the community.

For any help or issues of this project, please contact Jiahang Cao.

📝 Citation

If you find our work useful, please consider citing:

@article{cao2024mamba,
  title={Mamba Policy: Towards Efficient 3D Diffusion Policy with Hybrid Selective State Models},
  author={Cao, Jiahang and Zhang, Qiang and Sun, Jingkai and Wang, Jiaxu and Cheng, Hao and Li, Yulin and Ma, Jun and Shao, Yecheng and Zhao, Wen and Han, Gang and others},
  journal={arXiv preprint arXiv:2409.07163},
  year={2024}
}

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
3D-Diffusion-Policy		3D-Diffusion-Policy
scripts		scripts
third_party		third_party
visualizer		visualizer
.gitignore		.gitignore
ERROR_CATCH.md		ERROR_CATCH.md
INSTALL.md		INSTALL.md
README.md		README.md
main.png		main.png
mamba-policy-logo.svg		mamba-policy-logo.svg

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Mamba Policy

💻 Installation & 📚 Data Preparation

Step 1: Environment Setup

Step 2: Data Generation

🛠️ Simulation Usage

😎 Key Observations in Training

Training Speed Discrepancy Despite Fewer Parameters

Increased Initial Load and Training Time for Mamba Variants

😺 Acknowledgement

📝 Citation

About

Uh oh!

Releases

Packages

Uh oh!

Languages

SageCao1125/Mamba-Policy

Folders and files

Latest commit

History

Repository files navigation

Mamba Policy

💻 Installation & 📚 Data Preparation

Step 1: Environment Setup

Step 2: Data Generation

🛠️ Simulation Usage

😎 Key Observations in Training

Training Speed Discrepancy Despite Fewer Parameters

Increased Initial Load and Training Time for Mamba Variants

😺 Acknowledgement

📝 Citation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages