Error occured during evaluate phase #745

FanZhang91 · 2025-02-18T12:24:43Z

System Info

ubuntu==20.04
lerobot==0.1.0
gym-aloha==0.1.1
gym-pusht=0.1.5
gym-xarm==0.1.1
gymnasium==0.29.1

Information

One of the scripts in the examples/ folder of LeRobot
My own task or dataset (give details below)

Reproduction

**I use the following script to run the pi0 training program, but encountered the following error。

script: python train.py --dataset.repo_id=lerobot/aloha_sim_insertion_human --env.type=aloha --env.task=AlohaInsertion-v0 --policy.type=pi0**

---------------------------------------------------- error info ---------------------------------------------------------------
WARNING 2025-02-18 20:15:03 gs/train.py:129 No device specified, trying to infer device automatically
INFO 2025-02-18 20:15:03 ils/utils.py:46 Cuda backend detected, using cuda.
INFO 2025-02-18 20:15:03 ts/train.py:192 {'batch_size': 8,
'dataset': {'episodes': None,
'image_transforms': {'enable': False,
'max_num_transforms': 3,
'random_order': False,
'tfs': {'brightness': {'kwargs': {'brightness': (0.8,
1.2)},
'type': 'ColorJitter',
'weight': 1.0},
'contrast': {'kwargs': {'contrast': (0.8,
1.2)},
'type': 'ColorJitter',
'weight': 1.0},
'hue': {'kwargs': {'hue': (-0.05,
0.05)},
'type': 'ColorJitter',
'weight': 1.0},
'saturation': {'kwargs': {'saturation': (0.5,
1.5)},
'type': 'ColorJitter',
'weight': 1.0},
'sharpness': {'kwargs': {'sharpness': (0.5,
1.5)},
'type': 'SharpnessJitter',
'weight': 1.0}}},
'local_files_only': False,
'repo_id': 'lerobot/aloha_sim_insertion_human',
'use_imagenet_stats': True,
'video_backend': 'pyav'},
'device': 'cuda',
'env': {'episode_length': 400,
'features': {'action': {'shape': (14,),
'type': <FeatureType.ACTION: 'ACTION'>},
'agent_pos': {'shape': (14,),
'type': <FeatureType.STATE: 'STATE'>},
'pixels/top': {'shape': (480, 640, 3),
'type': <FeatureType.VISUAL: 'VISUAL'>}},
'features_map': {'action': 'action',
'agent_pos': 'observation.state',
'pixels/top': 'observation.images.top',
'top': 'observation.image.top'},
'fps': 50,
'obs_type': 'pixels_agent_pos',
'render_mode': 'rgb_array',
'task': 'AlohaInsertion-v0'},
'eval': {'batch_size': 50, 'n_episodes': 50, 'use_async_envs': False},
'eval_freq': 1,
'job_name': 'aloha_pi0',
'log_freq': 200,
'num_workers': 4,
'offline': {'steps': 100000},
'online': {'buffer_capacity': None,
'buffer_seed_size': 0,
'do_rollout_async': False,
'env_seed': None,
'rollout_batch_size': 1,
'rollout_n_episodes': 1,
'sampling_ratio': 0.5,
'steps': 0,
'steps_between_rollouts': None},
'optimizer': {'betas': (0.9, 0.95),
'eps': 1e-08,
'grad_clip_norm': 10.0,
'lr': 2.5e-05,
'weight_decay': 1e-10},
'output_dir': PosixPath('outputs/train/2025-02-18/20-15-03_aloha_pi0'),
'policy': {'adapt_to_pi_aloha': False,
'attention_implementation': 'eager',
'chunk_size': 50,
'empty_cameras': 0,
'freeze_vision_encoder': True,
'input_features': {},
'max_action_dim': 32,
'max_state_dim': 32,
'n_action_steps': 50,
'n_obs_steps': 1,
'normalization_mapping': {'ACTION': <NormalizationMode.MEAN_STD: 'MEAN_STD'>,
'STATE': <NormalizationMode.MEAN_STD: 'MEAN_STD'>,
'VISUAL': <NormalizationMode.IDENTITY: 'IDENTITY'>},
'num_steps': 10,
'optimizer_betas': (0.9, 0.95),
'optimizer_eps': 1e-08,
'optimizer_lr': 2.5e-05,
'optimizer_weight_decay': 1e-10,
'output_features': {},
'proj_width': 1024,
'resize_imgs_with_padding': (224, 224),
'scheduler_decay_lr': 2.5e-06,
'scheduler_decay_steps': 30000,
'scheduler_warmup_steps': 1000,
'tokenizer_max_length': 48,
'train_expert_only': True,
'train_state_proj': True,
'use_cache': True,
'use_delta_joint_actions_aloha': False},
'resume': False,
'save_checkpoint': True,
'save_freq': 20000,
'scheduler': {'decay_lr': 2.5e-06,
'num_decay_steps': 30000,
'num_warmup_steps': 1000,
'peak_lr': 2.5e-05},
'seed': 1000,
'use_amp': False,
'use_policy_training_preset': True,
'wandb': {'disable_artifact': False,
'enable': False,
'entity': None,
'notes': None,
'project': 'lerobot'}}
INFO 2025-02-18 20:15:03 n/logger.py:105 Logs will be saved locally.
INFO 2025-02-18 20:15:03 ts/train.py:206 Creating dataset
Fetching 4 files: 100%|██████████| 4/4 [00:00<00:00, 3159.55it/s]
Fetching 4 files: 100%|██████████| 4/4 [00:00<00:00, 3710.13it/s]
Fetching 106 files: 100%|██████████| 106/106 [00:00<00:00, 3637.82it/s]
INFO 2025-02-18 20:15:06 ts/train.py:214 Creating env
INFO 2025-02-18 20:15:06 /init.py:88 MUJOCO_GL is not set, so an OpenGL backend will be chosen automatically.
INFO 2025-02-18 20:15:06 /init.py:96 Successfully imported OpenGL backend: %s
INFO 2025-02-18 20:15:06 /init.py:31 MuJoCo library version is: %s
INFO 2025-02-18 20:15:09 ts/train.py:217 Creating policy
INFO 2025-02-18 20:15:34 ts/train.py:224 Creating optimizer and scheduler
INFO 2025-02-18 20:15:34 on/logger.py:45 Output dir: outputs/train/2025-02-18/20-15-03_aloha_pi0
INFO 2025-02-18 20:15:34 ts/train.py:238 cfg.env.task='AlohaInsertion-v0'
INFO 2025-02-18 20:15:34 ts/train.py:239 cfg.offline.steps=100000 (100K)
INFO 2025-02-18 20:15:34 ts/train.py:240 cfg.online.steps=0
INFO 2025-02-18 20:15:34 ts/train.py:241 offline_dataset.num_frames=25000 (25K)
INFO 2025-02-18 20:15:34 ts/train.py:242 offline_dataset.num_episodes=50
INFO 2025-02-18 20:15:34 ts/train.py:243 num_learnable_params=578036768 (578M)
INFO 2025-02-18 20:15:34 ts/train.py:244 num_total_params=3501372260 (4B)
INFO 2025-02-18 20:15:34 ts/train.py:312 Start offline training on a fixed dataset
INFO 2025-02-18 20:15:35 ts/train.py:143 step:0 smpl:8 ep:0 epch:0.00 loss:1.837 grdn:12.157 lr:5.0e-08 updt_s:0.840 data_s:0.350
INFO 2025-02-18 20:15:35 ts/train.py:252 Eval policy at step 1
Stepping through eval batches: 0%| | 0/1 [00:00<?, ?it/s]
Stepping through eval batches: 0%| | 0/1 [00:01<?, ?it/s]
Traceback (most recent call last):
File "/home/xxx/deep_learning/lerobot/lerobot/scripts/train.py", line 254, in evaluate_and_checkpoint_if_needed
eval_info = eval_policy(
File "/home/xxx/deep_learning/lerobot/lerobot/scripts/eval.py", line 295, in eval_policy
rollout_data = rollout(
File "/home/xxx/deep_learning/lerobot/lerobot/scripts/eval.py", line 160, in rollout
action = policy.select_action(observation)
File "/home/xxx/miniforge3/envs/lerobot/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
return func(*args, **kwargs)
File "/home/xxx/deep_learning/lerobot/lerobot/common/policies/pi0/modeling_pi0.py", line 285, in select_action
lang_tokens, lang_masks = self.prepare_language(batch)
File "/home/xxx/deep_learning/lerobot/lerobot/common/policies/pi0/modeling_pi0.py", line 386, in prepare_language
tasks = batch["task"]
KeyError: 'task'
------------------------------------------------------ error info -------------------------------------------------------------------------

In the evaluation phase, “observation” information comes from simulation environment. It seems "task" information is missing in "observation" variable. How to fix this bug?

Expected behavior

null

liyitenga · 2025-02-26T12:16:49Z

I have the same problem

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Error occured during evaluate phase #745

Error occured during evaluate phase #745

FanZhang91 commented Feb 18, 2025 •

edited

Loading

liyitenga commented Feb 26, 2025

Error occured during evaluate phase #745

Error occured during evaluate phase #745

Comments

FanZhang91 commented Feb 18, 2025 • edited Loading

System Info

Information

Reproduction

Expected behavior

liyitenga commented Feb 26, 2025

FanZhang91 commented Feb 18, 2025 •

edited

Loading