Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CI nightlies cpu/gpu & cleanup #75

Merged
merged 50 commits into from
Apr 25, 2024
Merged

Conversation

aliberts
Copy link
Collaborator

@aliberts aliberts commented Apr 16, 2024

This adds the ability to test lerobot on cpu & gpu docker builds

Changes:

  • Add nightly cpu & gpu docker builds and testing
  • PRs will be tested with pip install rather than poetry. This is to remove the necessity to have and maintain 2 different versions of pyproject.toml and poetry.lock.
    This does increase install time as poetry parallelizes package installs and is much faster (~30s vs ~3mn) but dependencies will be cached so it won't matter much in most cases where the PR doesn't change the pyproject.toml file.
  • For now though, we still need another version of the pyproject.toml files for the docker builds: I need to install the gym envs as local path dependency after checking them out with ssh in the CI, rather than as direct git dependencies during the docker build which would require setting up ssh in the dockerfile. Since I do not want to go down that rabbit hole, I've set up a script to automatically change those git dependancies as local path dependencies. When the environments will be pip-installable (at release), all this hurdle will disappear.
  • Add a DEVICE flag that shows up when running tests.
  • Move end-to-end test inside a Makefile to just call make test-ete rather than having duplicates in the different workflows.

CI structure:

.github/workflows/
├── build-docker-images.yml     -> Builds gpu/cpu docker images every night from main
├── nightly-tests.yml           -> Runs tests on previously built docker images every night
├── test-poetry_DEPRECATED.yml  -> Former test.yml, soon to be removed
└── test.yml                    -> Tests (cpu) run on PRs and merge to main

Todo in future PRs (at release):

  • remove the env repos checkout and local install
  • change envs from git dependencies to just pip dependencies

@aliberts aliberts self-assigned this Apr 16, 2024
@aliberts aliberts force-pushed the user/aliberts/2024_04_16_ci_builds branch 6 times, most recently from bd754e2 to af98239 Compare April 20, 2024 16:32
@aliberts aliberts added ✅ Tests Adds or modifies testing ⚙️ Infra/CI Infra / CI-related labels Apr 23, 2024
@aliberts aliberts force-pushed the user/aliberts/2024_04_16_ci_builds branch 2 times, most recently from 3f2ed73 to 3b62033 Compare April 24, 2024 07:08
@aliberts aliberts linked an issue Apr 24, 2024 that may be closed by this pull request
@aliberts aliberts force-pushed the user/aliberts/2024_04_16_ci_builds branch from c12712a to 0b4277a Compare April 25, 2024 10:21
@aliberts aliberts changed the title Add CI builds CI nightlies cpu/gpu & cleanup Apr 25, 2024
@aliberts aliberts marked this pull request as ready for review April 25, 2024 11:19
@aliberts aliberts requested a review from Cadene April 25, 2024 11:19
Copy link
Collaborator

@Cadene Cadene left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thanks ;)

I would suggest renaming ete -> end-to-end to be more explicit, but not sure.

Cadene and others added 12 commits April 25, 2024 14:47
Co-authored-by: Alexander Soare <[email protected]>

Tests cleaning & simplification (#81)

Fix tolerance for delta_timestamps (#84)

Co-authored-by: Remi <[email protected]>

Hotfix test_examples.py (#87)

Quality of life patches for eval.py (#86)

Add meta_data, revision v1.1

WIP add load functions + episode_data_index

id -> index, finish moving compute_stats before hf_dataset push_to_hub

Use v1.1, hf_transform_to_torch, Add 3 xarm datasets

Remove Prod, Tests are passind

Add tests/data

small fix

fix visualize_dataset

fix online training

fix online training

fix online training

Fixes for datasets 2.18 -> 2.19 update (#88)

WIP Dockerfile

Add Makefile

Add cpu build

Add libhdf5-dev in build-image

Add gpu build

Add libegl1-mesa-dev in cpu build

WIP Add docker build ci

Fix docker context

Fix dockerfile path

Fix dockerhub repo

Free up more disk space on runner

Include envs in cpu build

remove rm -rf /usr/local/share/boost

Test with pip only

Add python version file

Fix python version & extras

Checkout pathed envs

Add test artifacts

Update PR template

WIP Add nightly tests

Push gpu & cpu images to gpu/cpu dockerhub repos

WIP Test nightly

WIP test cpu build

WIP checkout pip install

Use Makefile for end-to-end tests

Fix cpu venv path

Fix end-to-end

Checkout envs

Revert "Checkout envs"

This reverts commit 7193f60.

Add working-directory

Fix builds (rebase hotfix #87 from main)

Test fixed builds

rebase from user/rcadene/2024_04_18_episode_data_index

Test images

Check nvidia-smi

Fix opengl on gpu image

Test gpu build

Try nvidia/cudagl

Test cudagl build

Try different registry
@aliberts aliberts force-pushed the user/aliberts/2024_04_16_ci_builds branch from feb75c4 to 9373d36 Compare April 25, 2024 12:48
@aliberts aliberts merged commit b980c5d into main Apr 25, 2024
1 check passed
@aliberts aliberts deleted the user/aliberts/2024_04_16_ci_builds branch April 25, 2024 12:58
aliberts added a commit that referenced this pull request Apr 27, 2024
- Changes on the `test.yml` workflow:
  - Using poetry instead of pip. Contrary to what I wrote in #75, it is possible to use poetry (and have the benefits of shorter install times) without the need for having two separate versions of `pyproject.toml` and `poetry.lock`.
  - Reduce the trigger scope to only run when files in these directories are modified:
    - `lerobot/`
    - `tests/`
    - `examples/`
    - `.github/`
- Add `style.yml` workflow for doing a `ruff check` pass on the code
- More cleanup (removed deprecated workflow)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
⚙️ Infra/CI Infra / CI-related ✅ Tests Adds or modifies testing
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

3 participants