Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error occured during evaluate phase #745

Open
2 tasks
FanZhang91 opened this issue Feb 18, 2025 · 1 comment
Open
2 tasks

Error occured during evaluate phase #745

FanZhang91 opened this issue Feb 18, 2025 · 1 comment

Comments

@FanZhang91
Copy link

FanZhang91 commented Feb 18, 2025

System Info

ubuntu==20.04
lerobot==0.1.0
gym-aloha==0.1.1
gym-pusht=0.1.5
gym-xarm==0.1.1
gymnasium==0.29.1

Information

  • One of the scripts in the examples/ folder of LeRobot
  • My own task or dataset (give details below)

Reproduction

**I use the following script to run the pi0 training program, but encountered the following error。

script: python train.py --dataset.repo_id=lerobot/aloha_sim_insertion_human --env.type=aloha --env.task=AlohaInsertion-v0 --policy.type=pi0**

---------------------------------------------------- error info ---------------------------------------------------------------
WARNING 2025-02-18 20:15:03 gs/train.py:129 No device specified, trying to infer device automatically
INFO 2025-02-18 20:15:03 ils/utils.py:46 Cuda backend detected, using cuda.
INFO 2025-02-18 20:15:03 ts/train.py:192 {'batch_size': 8,
'dataset': {'episodes': None,
'image_transforms': {'enable': False,
'max_num_transforms': 3,
'random_order': False,
'tfs': {'brightness': {'kwargs': {'brightness': (0.8,
1.2)},
'type': 'ColorJitter',
'weight': 1.0},
'contrast': {'kwargs': {'contrast': (0.8,
1.2)},
'type': 'ColorJitter',
'weight': 1.0},
'hue': {'kwargs': {'hue': (-0.05,
0.05)},
'type': 'ColorJitter',
'weight': 1.0},
'saturation': {'kwargs': {'saturation': (0.5,
1.5)},
'type': 'ColorJitter',
'weight': 1.0},
'sharpness': {'kwargs': {'sharpness': (0.5,
1.5)},
'type': 'SharpnessJitter',
'weight': 1.0}}},
'local_files_only': False,
'repo_id': 'lerobot/aloha_sim_insertion_human',
'use_imagenet_stats': True,
'video_backend': 'pyav'},
'device': 'cuda',
'env': {'episode_length': 400,
'features': {'action': {'shape': (14,),
'type': <FeatureType.ACTION: 'ACTION'>},
'agent_pos': {'shape': (14,),
'type': <FeatureType.STATE: 'STATE'>},
'pixels/top': {'shape': (480, 640, 3),
'type': <FeatureType.VISUAL: 'VISUAL'>}},
'features_map': {'action': 'action',
'agent_pos': 'observation.state',
'pixels/top': 'observation.images.top',
'top': 'observation.image.top'},
'fps': 50,
'obs_type': 'pixels_agent_pos',
'render_mode': 'rgb_array',
'task': 'AlohaInsertion-v0'},
'eval': {'batch_size': 50, 'n_episodes': 50, 'use_async_envs': False},
'eval_freq': 1,
'job_name': 'aloha_pi0',
'log_freq': 200,
'num_workers': 4,
'offline': {'steps': 100000},
'online': {'buffer_capacity': None,
'buffer_seed_size': 0,
'do_rollout_async': False,
'env_seed': None,
'rollout_batch_size': 1,
'rollout_n_episodes': 1,
'sampling_ratio': 0.5,
'steps': 0,
'steps_between_rollouts': None},
'optimizer': {'betas': (0.9, 0.95),
'eps': 1e-08,
'grad_clip_norm': 10.0,
'lr': 2.5e-05,
'weight_decay': 1e-10},
'output_dir': PosixPath('outputs/train/2025-02-18/20-15-03_aloha_pi0'),
'policy': {'adapt_to_pi_aloha': False,
'attention_implementation': 'eager',
'chunk_size': 50,
'empty_cameras': 0,
'freeze_vision_encoder': True,
'input_features': {},
'max_action_dim': 32,
'max_state_dim': 32,
'n_action_steps': 50,
'n_obs_steps': 1,
'normalization_mapping': {'ACTION': <NormalizationMode.MEAN_STD: 'MEAN_STD'>,
'STATE': <NormalizationMode.MEAN_STD: 'MEAN_STD'>,
'VISUAL': <NormalizationMode.IDENTITY: 'IDENTITY'>},
'num_steps': 10,
'optimizer_betas': (0.9, 0.95),
'optimizer_eps': 1e-08,
'optimizer_lr': 2.5e-05,
'optimizer_weight_decay': 1e-10,
'output_features': {},
'proj_width': 1024,
'resize_imgs_with_padding': (224, 224),
'scheduler_decay_lr': 2.5e-06,
'scheduler_decay_steps': 30000,
'scheduler_warmup_steps': 1000,
'tokenizer_max_length': 48,
'train_expert_only': True,
'train_state_proj': True,
'use_cache': True,
'use_delta_joint_actions_aloha': False},
'resume': False,
'save_checkpoint': True,
'save_freq': 20000,
'scheduler': {'decay_lr': 2.5e-06,
'num_decay_steps': 30000,
'num_warmup_steps': 1000,
'peak_lr': 2.5e-05},
'seed': 1000,
'use_amp': False,
'use_policy_training_preset': True,
'wandb': {'disable_artifact': False,
'enable': False,
'entity': None,
'notes': None,
'project': 'lerobot'}}
INFO 2025-02-18 20:15:03 n/logger.py:105 Logs will be saved locally.
INFO 2025-02-18 20:15:03 ts/train.py:206 Creating dataset
Fetching 4 files: 100%|██████████| 4/4 [00:00<00:00, 3159.55it/s]
Fetching 4 files: 100%|██████████| 4/4 [00:00<00:00, 3710.13it/s]
Fetching 106 files: 100%|██████████| 106/106 [00:00<00:00, 3637.82it/s]
INFO 2025-02-18 20:15:06 ts/train.py:214 Creating env
INFO 2025-02-18 20:15:06 /init.py:88 MUJOCO_GL is not set, so an OpenGL backend will be chosen automatically.
INFO 2025-02-18 20:15:06 /init.py:96 Successfully imported OpenGL backend: %s
INFO 2025-02-18 20:15:06 /init.py:31 MuJoCo library version is: %s
INFO 2025-02-18 20:15:09 ts/train.py:217 Creating policy
INFO 2025-02-18 20:15:34 ts/train.py:224 Creating optimizer and scheduler
INFO 2025-02-18 20:15:34 on/logger.py:45 Output dir: outputs/train/2025-02-18/20-15-03_aloha_pi0
INFO 2025-02-18 20:15:34 ts/train.py:238 cfg.env.task='AlohaInsertion-v0'
INFO 2025-02-18 20:15:34 ts/train.py:239 cfg.offline.steps=100000 (100K)
INFO 2025-02-18 20:15:34 ts/train.py:240 cfg.online.steps=0
INFO 2025-02-18 20:15:34 ts/train.py:241 offline_dataset.num_frames=25000 (25K)
INFO 2025-02-18 20:15:34 ts/train.py:242 offline_dataset.num_episodes=50
INFO 2025-02-18 20:15:34 ts/train.py:243 num_learnable_params=578036768 (578M)
INFO 2025-02-18 20:15:34 ts/train.py:244 num_total_params=3501372260 (4B)
INFO 2025-02-18 20:15:34 ts/train.py:312 Start offline training on a fixed dataset
INFO 2025-02-18 20:15:35 ts/train.py:143 step:0 smpl:8 ep:0 epch:0.00 loss:1.837 grdn:12.157 lr:5.0e-08 updt_s:0.840 data_s:0.350
INFO 2025-02-18 20:15:35 ts/train.py:252 Eval policy at step 1
Stepping through eval batches: 0%| | 0/1 [00:00<?, ?it/s]
Stepping through eval batches: 0%| | 0/1 [00:01<?, ?it/s]
Traceback (most recent call last):
File "/home/xxx/deep_learning/lerobot/lerobot/scripts/train.py", line 254, in evaluate_and_checkpoint_if_needed
eval_info = eval_policy(
File "/home/xxx/deep_learning/lerobot/lerobot/scripts/eval.py", line 295, in eval_policy
rollout_data = rollout(
File "/home/xxx/deep_learning/lerobot/lerobot/scripts/eval.py", line 160, in rollout
action = policy.select_action(observation)
File "/home/xxx/miniforge3/envs/lerobot/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
return func(*args, **kwargs)
File "/home/xxx/deep_learning/lerobot/lerobot/common/policies/pi0/modeling_pi0.py", line 285, in select_action
lang_tokens, lang_masks = self.prepare_language(batch)
File "/home/xxx/deep_learning/lerobot/lerobot/common/policies/pi0/modeling_pi0.py", line 386, in prepare_language
tasks = batch["task"]
KeyError: 'task'
------------------------------------------------------ error info -------------------------------------------------------------------------

In the evaluation phase, “observation” information comes from simulation environment. It seems "task" information is missing in "observation" variable. How to fix this bug?

Expected behavior

null

@liyitenga
Copy link

I have the same problem

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants