diff --git a/.github/PULL_REQUEST_TEMPLATE.md b/.github/PULL_REQUEST_TEMPLATE.md index 4063e395f..2e6fd4490 100644 --- a/.github/PULL_REQUEST_TEMPLATE.md +++ b/.github/PULL_REQUEST_TEMPLATE.md @@ -21,7 +21,7 @@ Provide a simple way for the reviewer to try out your changes. Examples: ```bash -DATA_DIR=tests/data pytest -sx tests/test_stuff.py::test_something +pytest -sx tests/test_stuff.py::test_something ``` ```bash python lerobot/scripts/train.py --some.option=true diff --git a/.github/workflows/nightly-tests.yml b/.github/workflows/nightly-tests.yml index f967533ae..bbee19a17 100644 --- a/.github/workflows/nightly-tests.yml +++ b/.github/workflows/nightly-tests.yml @@ -7,10 +7,8 @@ on: schedule: - cron: "0 2 * * *" -env: - DATA_DIR: tests/data +# env: # SLACK_API_TOKEN: ${{ secrets.SLACK_API_TOKEN }} - jobs: run_all_tests_cpu: name: CPU @@ -30,13 +28,9 @@ jobs: working-directory: /lerobot steps: - name: Tests - env: - DATA_DIR: tests/data run: pytest -v --cov=./lerobot --disable-warnings tests - name: Tests end-to-end - env: - DATA_DIR: tests/data run: make test-end-to-end diff --git a/.github/workflows/test.yml b/.github/workflows/test.yml index c32e1df14..5de071750 100644 --- a/.github/workflows/test.yml +++ b/.github/workflows/test.yml @@ -29,7 +29,6 @@ jobs: name: Pytest runs-on: ubuntu-latest env: - DATA_DIR: tests/data MUJOCO_GL: egl steps: - uses: actions/checkout@v4 @@ -70,7 +69,6 @@ jobs: name: Pytest (minimal install) runs-on: ubuntu-latest env: - DATA_DIR: tests/data MUJOCO_GL: egl steps: - uses: actions/checkout@v4 @@ -108,7 +106,6 @@ jobs: name: End-to-end runs-on: ubuntu-latest env: - DATA_DIR: tests/data MUJOCO_GL: egl steps: - uses: actions/checkout@v4 diff --git a/README.md b/README.md index 5fbf74f44..9331bdeca 100644 --- a/README.md +++ b/README.md @@ -153,10 +153,12 @@ python lerobot/scripts/visualize_dataset.py \ --episode-index 0 ``` -or from a dataset in a local folder with the root `DATA_DIR` environment variable (in the following case the dataset will be searched for in `./my_local_data_dir/lerobot/pusht`) +or from a dataset in a local folder with the `root` option and the `--local-files-only` (in the following case the dataset will be searched for in `./my_local_data_dir/lerobot/pusht`) ```bash -DATA_DIR='./my_local_data_dir' python lerobot/scripts/visualize_dataset.py \ +python lerobot/scripts/visualize_dataset.py \ --repo-id lerobot/pusht \ + --root ./my_local_data_dir \ + --local-files-only 1 \ --episode-index 0 ``` @@ -208,12 +210,10 @@ dataset attributes: A `LeRobotDataset` is serialised using several widespread file formats for each of its parts, namely: - hf_dataset stored using Hugging Face datasets library serialization to parquet -- videos are stored in mp4 format to save space or png files -- episode_data_index saved using `safetensor` tensor serialization format -- stats saved using `safetensor` tensor serialization format -- info are saved using JSON +- videos are stored in mp4 format to save space +- metadata are stored in plain json/jsonl files -Dataset can be uploaded/downloaded from the HuggingFace hub seamlessly. To work on a local dataset, you can set the `DATA_DIR` environment variable to your root dataset folder as illustrated in the above section on dataset visualization. +Dataset can be uploaded/downloaded from the HuggingFace hub seamlessly. To work on a local dataset, you can use the `local_files_only` argument and specify its location with the `root` argument if it's not in the default `~/.cache/huggingface/lerobot` location. ### Evaluate a pretrained policy diff --git a/examples/10_use_so100.md b/examples/10_use_so100.md index 2309b7bb7..70e4ed8ba 100644 --- a/examples/10_use_so100.md +++ b/examples/10_use_so100.md @@ -218,7 +218,7 @@ python lerobot/scripts/visualize_dataset_html.py \ Now try to replay the first episode on your robot: ```bash -DATA_DIR=data python lerobot/scripts/control_robot.py replay \ +python lerobot/scripts/control_robot.py replay \ --robot-path lerobot/configs/robot/so100.yaml \ --fps 30 \ --repo-id ${HF_USER}/so100_test \ @@ -229,7 +229,7 @@ DATA_DIR=data python lerobot/scripts/control_robot.py replay \ To train a policy to control your robot, use the [`python lerobot/scripts/train.py`](../lerobot/scripts/train.py) script. A few arguments are required. Here is an example command: ```bash -DATA_DIR=data python lerobot/scripts/train.py \ +python lerobot/scripts/train.py \ dataset_repo_id=${HF_USER}/so100_test \ policy=act_so100_real \ env=so100_real \ @@ -245,7 +245,6 @@ Let's explain it: 3. We provided an environment as argument with `env=so100_real`. This loads configurations from [`lerobot/configs/env/so100_real.yaml`](../lerobot/configs/env/so100_real.yaml). 4. We provided `device=cuda` since we are training on a Nvidia GPU, but you can also use `device=mps` if you are using a Mac with Apple silicon, or `device=cpu` otherwise. 5. We provided `wandb.enable=true` to use [Weights and Biases](https://docs.wandb.ai/quickstart) for visualizing training plots. This is optional but if you use it, make sure you are logged in by running `wandb login`. -6. We added `DATA_DIR=data` to access your dataset stored in your local `data` directory. If you dont provide `DATA_DIR`, your dataset will be downloaded from Hugging Face hub to your cache folder `$HOME/.cache/hugginface`. In future versions of `lerobot`, both directories will be in sync. Training should take several hours. You will find checkpoints in `outputs/train/act_so100_test/checkpoints`. diff --git a/examples/11_use_moss.md b/examples/11_use_moss.md index 39a56e933..55d6fcaf9 100644 --- a/examples/11_use_moss.md +++ b/examples/11_use_moss.md @@ -218,7 +218,7 @@ python lerobot/scripts/visualize_dataset_html.py \ Now try to replay the first episode on your robot: ```bash -DATA_DIR=data python lerobot/scripts/control_robot.py replay \ +python lerobot/scripts/control_robot.py replay \ --robot-path lerobot/configs/robot/moss.yaml \ --fps 30 \ --repo-id ${HF_USER}/moss_test \ @@ -229,7 +229,7 @@ DATA_DIR=data python lerobot/scripts/control_robot.py replay \ To train a policy to control your robot, use the [`python lerobot/scripts/train.py`](../lerobot/scripts/train.py) script. A few arguments are required. Here is an example command: ```bash -DATA_DIR=data python lerobot/scripts/train.py \ +python lerobot/scripts/train.py \ dataset_repo_id=${HF_USER}/moss_test \ policy=act_moss_real \ env=moss_real \ @@ -245,7 +245,6 @@ Let's explain it: 3. We provided an environment as argument with `env=moss_real`. This loads configurations from [`lerobot/configs/env/moss_real.yaml`](../lerobot/configs/env/moss_real.yaml). 4. We provided `device=cuda` since we are training on a Nvidia GPU, but you can also use `device=mps` if you are using a Mac with Apple silicon, or `device=cpu` otherwise. 5. We provided `wandb.enable=true` to use [Weights and Biases](https://docs.wandb.ai/quickstart) for visualizing training plots. This is optional but if you use it, make sure you are logged in by running `wandb login`. -6. We added `DATA_DIR=data` to access your dataset stored in your local `data` directory. If you dont provide `DATA_DIR`, your dataset will be downloaded from Hugging Face hub to your cache folder `$HOME/.cache/hugginface`. In future versions of `lerobot`, both directories will be in sync. Training should take several hours. You will find checkpoints in `outputs/train/act_moss_test/checkpoints`. diff --git a/examples/7_get_started_with_real_robot.md b/examples/7_get_started_with_real_robot.md index a6bfe65a4..76408275d 100644 --- a/examples/7_get_started_with_real_robot.md +++ b/examples/7_get_started_with_real_robot.md @@ -868,7 +868,7 @@ Your robot should replicate movements similar to those you recorded. For example To train a policy to control your robot, use the [`python lerobot/scripts/train.py`](../lerobot/scripts/train.py) script. A few arguments are required. Here is an example command: ```bash -DATA_DIR=data python lerobot/scripts/train.py \ +python lerobot/scripts/train.py \ dataset_repo_id=${HF_USER}/koch_test \ policy=act_koch_real \ env=koch_real \ @@ -915,7 +915,6 @@ env: It should match your dataset (e.g. `fps: 30`) and your robot (e.g. `state_dim: 6` and `action_dim: 6`). We are still working on simplifying this in future versions of `lerobot`. 4. We provided `device=cuda` since we are training on a Nvidia GPU, but you could use `device=mps` to train on Apple silicon. 5. We provided `wandb.enable=true` to use [Weights and Biases](https://docs.wandb.ai/quickstart) for visualizing training plots. This is optional but if you use it, make sure you are logged in by running `wandb login`. -6. We added `DATA_DIR=data` to access your dataset stored in your local `data` directory. If you dont provide `DATA_DIR`, your dataset will be downloaded from Hugging Face hub to your cache folder `$HOME/.cache/hugginface`. In future versions of `lerobot`, both directories will be in sync. For more information on the `train` script see the previous tutorial: [`examples/4_train_policy_with_script.md`](../examples/4_train_policy_with_script.md) diff --git a/examples/9_use_aloha.md b/examples/9_use_aloha.md index 866120b5f..1abf7c495 100644 --- a/examples/9_use_aloha.md +++ b/examples/9_use_aloha.md @@ -125,7 +125,7 @@ python lerobot/scripts/control_robot.py replay \ To train a policy to control your robot, use the [`python lerobot/scripts/train.py`](../lerobot/scripts/train.py) script. A few arguments are required. Here is an example command: ```bash -DATA_DIR=data python lerobot/scripts/train.py \ +python lerobot/scripts/train.py \ dataset_repo_id=${HF_USER}/aloha_test \ policy=act_aloha_real \ env=aloha_real \ @@ -141,7 +141,6 @@ Let's explain it: 3. We provided an environment as argument with `env=aloha_real`. This loads configurations from [`lerobot/configs/env/aloha_real.yaml`](../lerobot/configs/env/aloha_real.yaml). Note: this yaml defines 18 dimensions for the `state_dim` and `action_dim`, corresponding to 18 motors, not 14 motors as used in previous Aloha work. This is because, we include the `shoulder_shadow` and `elbow_shadow` motors for simplicity. 4. We provided `device=cuda` since we are training on a Nvidia GPU. 5. We provided `wandb.enable=true` to use [Weights and Biases](https://docs.wandb.ai/quickstart) for visualizing training plots. This is optional but if you use it, make sure you are logged in by running `wandb login`. -6. We added `DATA_DIR=data` to access your dataset stored in your local `data` directory. If you dont provide `DATA_DIR`, your dataset will be downloaded from Hugging Face hub to your cache folder `$HOME/.cache/hugginface`. In future versions of `lerobot`, both directories will be in sync. Training should take several hours. You will find checkpoints in `outputs/train/act_aloha_test/checkpoints`. diff --git a/lerobot/scripts/control_robot.py b/lerobot/scripts/control_robot.py index 3e7235639..563023f48 100644 --- a/lerobot/scripts/control_robot.py +++ b/lerobot/scripts/control_robot.py @@ -73,7 +73,7 @@ - Train on this dataset with the ACT policy: ```bash -DATA_DIR=data python lerobot/scripts/train.py \ +python lerobot/scripts/train.py \ policy=act_koch_real \ env=koch_real \ dataset_repo_id=$USER/koch_pick_place_lego \ diff --git a/lerobot/scripts/visualize_dataset.py b/lerobot/scripts/visualize_dataset.py index 03205f254..cdd5ce605 100644 --- a/lerobot/scripts/visualize_dataset.py +++ b/lerobot/scripts/visualize_dataset.py @@ -207,11 +207,17 @@ def main(): required=True, help="Episode to visualize.", ) + parser.add_argument( + "--local-files-only", + type=int, + default=0, + help="Use local files only. By default, this script will try to fetch the dataset from the hub if it exists.", + ) parser.add_argument( "--root", type=Path, default=None, - help="Root directory for a dataset stored locally (e.g. `--root data`). By default, the dataset will be loaded from hugging face cache folder, or downloaded from the hub if available.", + help="Root directory for the dataset stored locally (e.g. `--root data`). By default, the dataset will be loaded from hugging face cache folder, or downloaded from the hub if available.", ) parser.add_argument( "--output-dir", @@ -269,9 +275,10 @@ def main(): kwargs = vars(args) repo_id = kwargs.pop("repo_id") root = kwargs.pop("root") + local_files_only = kwargs.pop("local_files_only") logging.info("Loading dataset") - dataset = LeRobotDataset(repo_id, root=root, local_files_only=True) + dataset = LeRobotDataset(repo_id, root=root, local_files_only=local_files_only) visualize_dataset(dataset, **vars(args)) diff --git a/lerobot/scripts/visualize_dataset_html.py b/lerobot/scripts/visualize_dataset_html.py index 475983d3a..2c81fbfc5 100644 --- a/lerobot/scripts/visualize_dataset_html.py +++ b/lerobot/scripts/visualize_dataset_html.py @@ -234,6 +234,12 @@ def main(): required=True, help="Name of hugging face repositery containing a LeRobotDataset dataset (e.g. `lerobot/pusht` for https://huggingface.co/datasets/lerobot/pusht).", ) + parser.add_argument( + "--local-files-only", + type=int, + default=0, + help="Use local files only. By default, this script will try to fetch the dataset from the hub if it exists.", + ) parser.add_argument( "--root", type=Path, @@ -282,7 +288,9 @@ def main(): kwargs = vars(args) repo_id = kwargs.pop("repo_id") root = kwargs.pop("root") - dataset = LeRobotDataset(repo_id, root=root, local_files_only=True) + local_files_only = kwargs.pop("local_files_only") + + dataset = LeRobotDataset(repo_id, root=root, local_files_only=local_files_only) visualize_dataset_html(dataset, **kwargs) diff --git a/tests/test_policies.py b/tests/test_policies.py index e7a2c2f87..ae3567433 100644 --- a/tests/test_policies.py +++ b/tests/test_policies.py @@ -383,7 +383,7 @@ def test_backward_compatibility(env_name, policy_name, extra_overrides, file_nam include a report on what changed and how that affected the outputs. 2. Go to the `if __name__ == "__main__"` block of `tests/scripts/save_policy_to_safetensors.py` and add the policies you want to update the test artifacts for. - 3. Run `DATA_DIR=tests/data python tests/scripts/save_policy_to_safetensors.py`. The test artifact + 3. Run `python tests/scripts/save_policy_to_safetensors.py`. The test artifact should be updated. 4. Check that this test now passes. 5. Remember to restore `tests/scripts/save_policy_to_safetensors.py` to its original state. diff --git a/tests/test_push_dataset_to_hub.py b/tests/test_push_dataset_to_hub.py index bcba38f00..ff630ab66 100644 --- a/tests/test_push_dataset_to_hub.py +++ b/tests/test_push_dataset_to_hub.py @@ -5,7 +5,7 @@ Example to run backward compatiblity tests locally: ``` -DATA_DIR=tests/data python -m pytest --run-skipped tests/test_push_dataset_to_hub.py::test_push_dataset_to_hub_pusht_backward_compatibility +python -m pytest --run-skipped tests/test_push_dataset_to_hub.py::test_push_dataset_to_hub_pusht_backward_compatibility ``` """ @@ -330,7 +330,7 @@ def test_push_dataset_to_hub_format(required_packages, tmpdir, raw_format, repo_ ], ) @pytest.mark.skip( - "Not compatible with our CI since it downloads raw datasets. Run with `DATA_DIR=tests/data python -m pytest --run-skipped tests/test_push_dataset_to_hub.py::test_push_dataset_to_hub_pusht_backward_compatibility`" + "Not compatible with our CI since it downloads raw datasets. Run with `python -m pytest --run-skipped tests/test_push_dataset_to_hub.py::test_push_dataset_to_hub_pusht_backward_compatibility`" ) def test_push_dataset_to_hub_pusht_backward_compatibility(tmpdir, raw_format, repo_id): _, dataset_id = repo_id.split("/")