Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

'Address boundary error' when training act policy with aloha_sim_insertion_human #755

Open
1 of 2 tasks
julyfun opened this issue Feb 20, 2025 · 0 comments
Open
1 of 2 tasks

Comments

@julyfun
Copy link

julyfun commented Feb 20, 2025

System Info

- `lerobot` version: 0.1.0
- Platform: Linux-5.15.153.1-microsoft-standard-WSL2+-x86_64-with-glibc2.35
- Python version: 3.10.16
- Huggingface_hub version: 0.27.1
- Dataset version: 3.2.0
- Numpy version: 2.0.2
- PyTorch version (GPU?): 2.6.0+cu124 (True)
- Cuda version: 12040
- Using GPU in script?: <fill in>
(Lerobot on branch master, using cuda 12.4. As in `pyproject.toml` requires torchvision `>=0.21.0`, only pytorch `2.6.0` is compatible)

Information

  • One of the scripts in the examples/ folder of LeRobot
  • My own task or dataset (give details below)

Reproduction

python lerobot/scripts/train.py \
          --policy.type=act \
          --dataset.repo_id=lerobot/aloha_sim_insertion_human \
          --env.type=aloha \
          --env.task=AlohaInsertion-v0 \
          --log_freq=25 \
          --save_freq=100 

INFO 2025-02-20 20:08:24 ts/train.py:143 step:20K smpl:160K ep:320 epch:6.39 loss:0.073 grdn:6.679 lr:1.0e-05 updt_s:0.081 data_s:0.000
INFO 2025-02-20 20:08:26 ts/train.py:252 Eval policy at step 20000
Stepping through eval batches: 0%| | 0/1 [00:00<?, ?it/s]fish: Job 1, 'python lerobot/scripts/train.py…' terminated by signal --policy.type=act \ ( --dataset.repo_id=lerobot/a…)
fish: Job --env.type=aloha , ' --env.task=AlohaInsertion-v…' terminated by signal --log_freq=25 \ ( --save_freq=100)
fish: Job SIGSEGV, 'Address boundary error' terminated by signal ()

The error appears in the evaluation process after 20K training steps. In bash it shows Segmentation fault (core dumped)

Expected behavior

Evaluation goes to 100%.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant