Remove DATA_DIR references

huggingface · Nov 28, 2024 · d6b4429 · d6b4429
1 parent 2556960
commit d6b4429
Show file tree

Hide file tree

Showing 13 changed files with 37 additions and 35 deletions.
diff --git a/.github/PULL_REQUEST_TEMPLATE.md b/.github/PULL_REQUEST_TEMPLATE.md
@@ -21,7 +21,7 @@ Provide a simple way for the reviewer to try out your changes.
 
 Examples:
 ```bash
-DATA_DIR=tests/data pytest -sx tests/test_stuff.py::test_something
+pytest -sx tests/test_stuff.py::test_something
 ```
 ```bash
 python lerobot/scripts/train.py --some.option=true

diff --git a/.github/workflows/nightly-tests.yml b/.github/workflows/nightly-tests.yml
@@ -7,10 +7,8 @@ on:
   schedule:
     - cron: "0 2 * * *"
 
-env:
-  DATA_DIR: tests/data
+# env:
   # SLACK_API_TOKEN: ${{ secrets.SLACK_API_TOKEN }}
-
 jobs:
   run_all_tests_cpu:
     name: CPU
@@ -30,13 +28,9 @@ jobs:
         working-directory: /lerobot
     steps:
       - name: Tests
-        env:
-          DATA_DIR: tests/data
         run: pytest -v --cov=./lerobot --disable-warnings tests
 
       - name: Tests end-to-end
-        env:
-          DATA_DIR: tests/data
         run: make test-end-to-end
 
 

diff --git a/.github/workflows/test.yml b/.github/workflows/test.yml
@@ -29,7 +29,6 @@ jobs:
     name: Pytest
     runs-on: ubuntu-latest
     env:
-      DATA_DIR: tests/data
       MUJOCO_GL: egl
     steps:
       - uses: actions/checkout@v4
@@ -70,7 +69,6 @@ jobs:
     name: Pytest (minimal install)
     runs-on: ubuntu-latest
     env:
-      DATA_DIR: tests/data
       MUJOCO_GL: egl
     steps:
       - uses: actions/checkout@v4
@@ -108,7 +106,6 @@ jobs:
     name: End-to-end
     runs-on: ubuntu-latest
     env:
-      DATA_DIR: tests/data
       MUJOCO_GL: egl
     steps:
       - uses: actions/checkout@v4

diff --git a/README.md b/README.md
@@ -153,10 +153,12 @@ python lerobot/scripts/visualize_dataset.py \
     --episode-index 0
 ```
 
-or from a dataset in a local folder with the root `DATA_DIR` environment variable (in the following case the dataset will be searched for in `./my_local_data_dir/lerobot/pusht`)
+or from a dataset in a local folder with the `root` option and the `--local-files-only` (in the following case the dataset will be searched for in `./my_local_data_dir/lerobot/pusht`)
 ```bash
-DATA_DIR='./my_local_data_dir' python lerobot/scripts/visualize_dataset.py \
+python lerobot/scripts/visualize_dataset.py \
     --repo-id lerobot/pusht \
+    --root ./my_local_data_dir \
+    --local-files-only 1 \
     --episode-index 0
 ```
 
@@ -208,12 +210,10 @@ dataset attributes:
 
 A `LeRobotDataset` is serialised using several widespread file formats for each of its parts, namely:
 - hf_dataset stored using Hugging Face datasets library serialization to parquet
-- videos are stored in mp4 format to save space or png files
-- episode_data_index saved using `safetensor` tensor serialization format
-- stats saved using `safetensor` tensor serialization format
-- info are saved using JSON
+- videos are stored in mp4 format to save space
+- metadata are stored in plain json/jsonl files
 
-Dataset can be uploaded/downloaded from the HuggingFace hub seamlessly. To work on a local dataset, you can set the `DATA_DIR` environment variable to your root dataset folder as illustrated in the above section on dataset visualization.
+Dataset can be uploaded/downloaded from the HuggingFace hub seamlessly. To work on a local dataset, you can use the `local_files_only` argument and specify its location with the `root` argument if it's not in the default `~/.cache/huggingface/lerobot` location.
 
 ### Evaluate a pretrained policy
 

diff --git a/examples/10_use_so100.md b/examples/10_use_so100.md
@@ -218,7 +218,7 @@ python lerobot/scripts/visualize_dataset_html.py \
 
 Now try to replay the first episode on your robot:
 ```bash
-DATA_DIR=data python lerobot/scripts/control_robot.py replay \
+python lerobot/scripts/control_robot.py replay \
     --robot-path lerobot/configs/robot/so100.yaml \
     --fps 30 \
     --repo-id ${HF_USER}/so100_test \
@@ -229,7 +229,7 @@ DATA_DIR=data python lerobot/scripts/control_robot.py replay \
 
 To train a policy to control your robot, use the [`python lerobot/scripts/train.py`](../lerobot/scripts/train.py) script. A few arguments are required. Here is an example command:
 ```bash
-DATA_DIR=data python lerobot/scripts/train.py \
+python lerobot/scripts/train.py \
   dataset_repo_id=${HF_USER}/so100_test \
   policy=act_so100_real \
   env=so100_real \
@@ -245,7 +245,6 @@ Let's explain it:
 3. We provided an environment as argument with `env=so100_real`. This loads configurations from [`lerobot/configs/env/so100_real.yaml`](../lerobot/configs/env/so100_real.yaml).
 4. We provided `device=cuda` since we are training on a Nvidia GPU, but you can also use `device=mps` if you are using a Mac with Apple silicon, or `device=cpu` otherwise.
 5. We provided `wandb.enable=true` to use [Weights and Biases](https://docs.wandb.ai/quickstart) for visualizing training plots. This is optional but if you use it, make sure you are logged in by running `wandb login`.
-6. We added `DATA_DIR=data` to access your dataset stored in your local `data` directory. If you dont provide `DATA_DIR`, your dataset will be downloaded from Hugging Face hub to your cache folder `$HOME/.cache/hugginface`. In future versions of `lerobot`, both directories will be in sync.
 
 Training should take several hours. You will find checkpoints in `outputs/train/act_so100_test/checkpoints`.
 

diff --git a/examples/11_use_moss.md b/examples/11_use_moss.md
@@ -218,7 +218,7 @@ python lerobot/scripts/visualize_dataset_html.py \
 
 Now try to replay the first episode on your robot:
 ```bash
-DATA_DIR=data python lerobot/scripts/control_robot.py replay \
+python lerobot/scripts/control_robot.py replay \
     --robot-path lerobot/configs/robot/moss.yaml \
     --fps 30 \
     --repo-id ${HF_USER}/moss_test \
@@ -229,7 +229,7 @@ DATA_DIR=data python lerobot/scripts/control_robot.py replay \
 
 To train a policy to control your robot, use the [`python lerobot/scripts/train.py`](../lerobot/scripts/train.py) script. A few arguments are required. Here is an example command:
 ```bash
-DATA_DIR=data python lerobot/scripts/train.py \
+python lerobot/scripts/train.py \
   dataset_repo_id=${HF_USER}/moss_test \
   policy=act_moss_real \
   env=moss_real \
@@ -245,7 +245,6 @@ Let's explain it:
 3. We provided an environment as argument with `env=moss_real`. This loads configurations from [`lerobot/configs/env/moss_real.yaml`](../lerobot/configs/env/moss_real.yaml).
 4. We provided `device=cuda` since we are training on a Nvidia GPU, but you can also use `device=mps` if you are using a Mac with Apple silicon, or `device=cpu` otherwise.
 5. We provided `wandb.enable=true` to use [Weights and Biases](https://docs.wandb.ai/quickstart) for visualizing training plots. This is optional but if you use it, make sure you are logged in by running `wandb login`.
-6. We added `DATA_DIR=data` to access your dataset stored in your local `data` directory. If you dont provide `DATA_DIR`, your dataset will be downloaded from Hugging Face hub to your cache folder `$HOME/.cache/hugginface`. In future versions of `lerobot`, both directories will be in sync.
 
 Training should take several hours. You will find checkpoints in `outputs/train/act_moss_test/checkpoints`.
 

diff --git a/examples/7_get_started_with_real_robot.md b/examples/7_get_started_with_real_robot.md
@@ -868,7 +868,7 @@ Your robot should replicate movements similar to those you recorded. For example
 
 To train a policy to control your robot, use the [`python lerobot/scripts/train.py`](../lerobot/scripts/train.py) script. A few arguments are required. Here is an example command:
 ```bash
-DATA_DIR=data python lerobot/scripts/train.py \
+python lerobot/scripts/train.py \
   dataset_repo_id=${HF_USER}/koch_test \
   policy=act_koch_real \
   env=koch_real \
@@ -915,7 +915,6 @@ env:
 It should match your dataset (e.g. `fps: 30`) and your robot (e.g. `state_dim: 6` and `action_dim: 6`). We are still working on simplifying this in future versions of `lerobot`.
 4. We provided `device=cuda` since we are training on a Nvidia GPU, but you could use `device=mps` to train on Apple silicon.
 5. We provided `wandb.enable=true` to use [Weights and Biases](https://docs.wandb.ai/quickstart) for visualizing training plots. This is optional but if you use it, make sure you are logged in by running `wandb login`.
-6. We added `DATA_DIR=data` to access your dataset stored in your local `data` directory. If you dont provide `DATA_DIR`, your dataset will be downloaded from Hugging Face hub to your cache folder `$HOME/.cache/hugginface`. In future versions of `lerobot`, both directories will be in sync.
 
 For more information on the `train` script see the previous tutorial: [`examples/4_train_policy_with_script.md`](../examples/4_train_policy_with_script.md)
 

diff --git a/examples/9_use_aloha.md b/examples/9_use_aloha.md
@@ -125,7 +125,7 @@ python lerobot/scripts/control_robot.py replay \
 
 To train a policy to control your robot, use the [`python lerobot/scripts/train.py`](../lerobot/scripts/train.py) script. A few arguments are required. Here is an example command:
 ```bash
-DATA_DIR=data python lerobot/scripts/train.py \
+python lerobot/scripts/train.py \
   dataset_repo_id=${HF_USER}/aloha_test \
   policy=act_aloha_real \
   env=aloha_real \
@@ -141,7 +141,6 @@ Let's explain it:
 3. We provided an environment as argument with `env=aloha_real`. This loads configurations from [`lerobot/configs/env/aloha_real.yaml`](../lerobot/configs/env/aloha_real.yaml). Note: this yaml defines 18 dimensions for the `state_dim` and `action_dim`, corresponding to 18 motors, not 14 motors as used in previous Aloha work. This is because, we include the `shoulder_shadow` and `elbow_shadow` motors for simplicity.
 4. We provided `device=cuda` since we are training on a Nvidia GPU.
 5. We provided `wandb.enable=true` to use [Weights and Biases](https://docs.wandb.ai/quickstart) for visualizing training plots. This is optional but if you use it, make sure you are logged in by running `wandb login`.
-6. We added `DATA_DIR=data` to access your dataset stored in your local `data` directory. If you dont provide `DATA_DIR`, your dataset will be downloaded from Hugging Face hub to your cache folder `$HOME/.cache/hugginface`. In future versions of `lerobot`, both directories will be in sync.
 
 Training should take several hours. You will find checkpoints in `outputs/train/act_aloha_test/checkpoints`.
 

diff --git a/lerobot/scripts/control_robot.py b/lerobot/scripts/control_robot.py
@@ -73,7 +73,7 @@
 
 - Train on this dataset with the ACT policy:
 ```bash
-DATA_DIR=data python lerobot/scripts/train.py \
+python lerobot/scripts/train.py \
     policy=act_koch_real \
     env=koch_real \
     dataset_repo_id=$USER/koch_pick_place_lego \

diff --git a/lerobot/scripts/visualize_dataset.py b/lerobot/scripts/visualize_dataset.py
@@ -207,11 +207,17 @@ def main():
         required=True,
         help="Episode to visualize.",
     )
+    parser.add_argument(
+        "--local-files-only",
+        type=int,
+        default=0,
+        help="Use local files only. By default, this script will try to fetch the dataset from the hub if it exists.",
+    )
     parser.add_argument(
         "--root",
         type=Path,
         default=None,
-        help="Root directory for a dataset stored locally (e.g. `--root data`). By default, the dataset will be loaded from hugging face cache folder, or downloaded from the hub if available.",
+        help="Root directory for the dataset stored locally (e.g. `--root data`). By default, the dataset will be loaded from hugging face cache folder, or downloaded from the hub if available.",
     )
     parser.add_argument(
         "--output-dir",
@@ -269,9 +275,10 @@ def main():
     kwargs = vars(args)
     repo_id = kwargs.pop("repo_id")
     root = kwargs.pop("root")
+    local_files_only = kwargs.pop("local_files_only")
 
     logging.info("Loading dataset")
-    dataset = LeRobotDataset(repo_id, root=root, local_files_only=True)
+    dataset = LeRobotDataset(repo_id, root=root, local_files_only=local_files_only)
 
     visualize_dataset(dataset, **vars(args))
 

diff --git a/lerobot/scripts/visualize_dataset_html.py b/lerobot/scripts/visualize_dataset_html.py
@@ -234,6 +234,12 @@ def main():
         required=True,
         help="Name of hugging face repositery containing a LeRobotDataset dataset (e.g. `lerobot/pusht` for https://huggingface.co/datasets/lerobot/pusht).",
     )
+    parser.add_argument(
+        "--local-files-only",
+        type=int,
+        default=0,
+        help="Use local files only. By default, this script will try to fetch the dataset from the hub if it exists.",
+    )
     parser.add_argument(
         "--root",
         type=Path,
@@ -282,7 +288,9 @@ def main():
     kwargs = vars(args)
     repo_id = kwargs.pop("repo_id")
     root = kwargs.pop("root")
-    dataset = LeRobotDataset(repo_id, root=root, local_files_only=True)
+    local_files_only = kwargs.pop("local_files_only")
+
+    dataset = LeRobotDataset(repo_id, root=root, local_files_only=local_files_only)
     visualize_dataset_html(dataset, **kwargs)
 
 

diff --git a/tests/test_policies.py b/tests/test_policies.py
@@ -383,7 +383,7 @@ def test_backward_compatibility(env_name, policy_name, extra_overrides, file_nam
            include a report on what changed and how that affected the outputs.
         2. Go to the `if __name__ == "__main__"` block of `tests/scripts/save_policy_to_safetensors.py` and
            add the policies you want to update the test artifacts for.
-        3. Run `DATA_DIR=tests/data python tests/scripts/save_policy_to_safetensors.py`. The test artifact
+        3. Run `python tests/scripts/save_policy_to_safetensors.py`. The test artifact
            should be updated.
         4. Check that this test now passes.
         5. Remember to restore `tests/scripts/save_policy_to_safetensors.py` to its original state.

diff --git a/tests/test_push_dataset_to_hub.py b/tests/test_push_dataset_to_hub.py
@@ -5,7 +5,7 @@
 
 Example to run backward compatiblity tests locally:
 ```
-DATA_DIR=tests/data python -m pytest --run-skipped tests/test_push_dataset_to_hub.py::test_push_dataset_to_hub_pusht_backward_compatibility
+python -m pytest --run-skipped tests/test_push_dataset_to_hub.py::test_push_dataset_to_hub_pusht_backward_compatibility
 ```
 """
 
@@ -330,7 +330,7 @@ def test_push_dataset_to_hub_format(required_packages, tmpdir, raw_format, repo_
     ],
 )
 @pytest.mark.skip(
-    "Not compatible with our CI since it downloads raw datasets. Run with `DATA_DIR=tests/data python -m pytest --run-skipped tests/test_push_dataset_to_hub.py::test_push_dataset_to_hub_pusht_backward_compatibility`"
+    "Not compatible with our CI since it downloads raw datasets. Run with `python -m pytest --run-skipped tests/test_push_dataset_to_hub.py::test_push_dataset_to_hub_pusht_backward_compatibility`"
 )
 def test_push_dataset_to_hub_pusht_backward_compatibility(tmpdir, raw_format, repo_id):
     _, dataset_id = repo_id.split("/")