CI nightlies cpu/gpu & cleanup #75

aliberts · 2024-04-16T09:13:03Z

This adds the ability to test lerobot on cpu & gpu docker builds

Changes:

Add nightly cpu & gpu docker builds and testing
PRs will be tested with pip install rather than poetry. This is to remove the necessity to have and maintain 2 different versions of pyproject.toml and poetry.lock.
This does increase install time as poetry parallelizes package installs and is much faster (~30s vs ~3mn) but dependencies will be cached so it won't matter much in most cases where the PR doesn't change the pyproject.toml file.
For now though, we still need another version of the pyproject.toml files for the docker builds: I need to install the gym envs as local path dependency after checking them out with ssh in the CI, rather than as direct git dependencies during the docker build which would require setting up ssh in the dockerfile. Since I do not want to go down that rabbit hole, I've set up a script to automatically change those git dependancies as local path dependencies. When the environments will be pip-installable (at release), all this hurdle will disappear.
Add a DEVICE flag that shows up when running tests.
Move end-to-end test inside a Makefile to just call make test-ete rather than having duplicates in the different workflows.

CI structure:

.github/workflows/
├── build-docker-images.yml     -> Builds gpu/cpu docker images every night from main
├── nightly-tests.yml           -> Runs tests on previously built docker images every night
├── test-poetry_DEPRECATED.yml  -> Former test.yml, soon to be removed
└── test.yml                    -> Tests (cpu) run on PRs and merge to main

Todo in future PRs (at release):

remove the env repos checkout and local install
change envs from git dependencies to just pip dependencies

Cadene

LGTM! Thanks ;)

I would suggest renaming ete -> end-to-end to be more explicit, but not sure.

Co-authored-by: Alexander Soare <[email protected]> Tests cleaning & simplification (#81) Fix tolerance for delta_timestamps (#84) Co-authored-by: Remi <[email protected]> Hotfix test_examples.py (#87) Quality of life patches for eval.py (#86) Add meta_data, revision v1.1 WIP add load functions + episode_data_index id -> index, finish moving compute_stats before hf_dataset push_to_hub Use v1.1, hf_transform_to_torch, Add 3 xarm datasets Remove Prod, Tests are passind Add tests/data small fix fix visualize_dataset fix online training fix online training fix online training Fixes for datasets 2.18 -> 2.19 update (#88) WIP Dockerfile Add Makefile Add cpu build Add libhdf5-dev in build-image Add gpu build Add libegl1-mesa-dev in cpu build WIP Add docker build ci Fix docker context Fix dockerfile path Fix dockerhub repo Free up more disk space on runner Include envs in cpu build remove rm -rf /usr/local/share/boost Test with pip only Add python version file Fix python version & extras Checkout pathed envs Add test artifacts Update PR template WIP Add nightly tests Push gpu & cpu images to gpu/cpu dockerhub repos WIP Test nightly WIP test cpu build WIP checkout pip install Use Makefile for end-to-end tests Fix cpu venv path Fix end-to-end Checkout envs Revert "Checkout envs" This reverts commit 7193f60. Add working-directory Fix builds (rebase hotfix #87 from main) Test fixed builds rebase from user/rcadene/2024_04_18_episode_data_index Test images Check nvidia-smi Fix opengl on gpu image Test gpu build Try nvidia/cudagl Test cudagl build Try different registry

- Changes on the `test.yml` workflow: - Using poetry instead of pip. Contrary to what I wrote in #75, it is possible to use poetry (and have the benefits of shorter install times) without the need for having two separate versions of `pyproject.toml` and `poetry.lock`. - Reduce the trigger scope to only run when files in these directories are modified: - `lerobot/` - `tests/` - `examples/` - `.github/` - Add `style.yml` workflow for doing a `ruff check` pass on the code - More cleanup (removed deprecated workflow)

aliberts self-assigned this Apr 16, 2024

aliberts force-pushed the user/aliberts/2024_04_16_ci_builds branch 6 times, most recently from bd754e2 to af98239 Compare April 20, 2024 16:32

aliberts added ✅ Tests Adds or modifies testing ⚙️ Infra/CI Infra / CI-related labels Apr 23, 2024

aliberts force-pushed the user/aliberts/2024_04_16_ci_builds branch 2 times, most recently from 3f2ed73 to 3b62033 Compare April 24, 2024 07:08

aliberts linked an issue Apr 24, 2024 that may be closed by this pull request

Wrong normalization of the images #61

Closed

aliberts removed a link to an issue Apr 24, 2024

Wrong normalization of the images #61

Closed

aliberts force-pushed the user/aliberts/2024_04_16_ci_builds branch from c12712a to 0b4277a Compare April 25, 2024 10:21

aliberts changed the title ~~Add CI builds~~ CI nightlies cpu/gpu & cleanup Apr 25, 2024

aliberts marked this pull request as ready for review April 25, 2024 11:19

aliberts requested a review from Cadene April 25, 2024 11:19

Cadene approved these changes Apr 25, 2024

View reviewed changes

Cadene and others added 12 commits April 25, 2024 14:47

Update lock after rebase main

c5ebf23

Add install EGL

55ffb42

remove sudo

8ea29b2

Try display :0

45443d4

MUJOCO_EGL_DEVICE_ID=0

3896786

unset DISPLAY

c1f3e67

Trying xvfb

3e68c00

Run with new entrypoint

5fc1bcb

Pass global env variables

0a86538

add ssh tailscale

f4ed7a5

add curl + privileged

6f331d1

aliberts added 25 commits April 25, 2024 14:47

Thinning image more

b836883

Testing thinned image

0642f42

Thinning image more

5ea2d2c

Testing thinned image

88dae64

Fixing & thinning

7da5df8

Testing thinned image

a6add57

Adding back libglib2.0-0

4f0048c

Testing with libglib2.0-0

e8bed61

Fixing

ffc178a

Fixing

cde7f0d

Print device in tests

d4eeaa6

Cleaning cpu & gpu images

10755a9

Build new images

4144f71

Fix python version

d0d64bf

Test builds

aa8025a

Change jobs names

35248a9

Trying new cache strategy

5fda961

Cleanup

ec4b988

More cleanup

5cb4799

Fix poetry files path

e2b196b

Fix poetry files path (cpu)

9afd0ed

Use script to change pyproject instead of 2 versions

3416341

Last test

67c3554

Cleanup

c2fcfd8

More explicit end-to-end test name

9373d36

aliberts force-pushed the user/aliberts/2024_04_16_ci_builds branch from feb75c4 to 9373d36 Compare April 25, 2024 12:48

aliberts merged commit b980c5d into main Apr 25, 2024
1 check passed

aliberts deleted the user/aliberts/2024_04_16_ci_builds branch April 25, 2024 12:58

aliberts mentioned this pull request Apr 27, 2024

More CI cleanup, add style workflow #107

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CI nightlies cpu/gpu & cleanup #75

CI nightlies cpu/gpu & cleanup #75

aliberts commented Apr 16, 2024 •

edited

Loading

Cadene left a comment

CI nightlies cpu/gpu & cleanup #75

CI nightlies cpu/gpu & cleanup #75

Conversation

aliberts commented Apr 16, 2024 • edited Loading

Cadene left a comment

Choose a reason for hiding this comment

aliberts commented Apr 16, 2024 •

edited

Loading