Skip to content

Commit 00ef8a7

Browse files
authored
Respect each example requirements and use uv (#1330)
* Add requirements.txt to examples which miss them Signed-off-by: Dmitry Rogozhkin <[email protected]> * Update numpy requirement for reinforcement_learning to be <2 Current version of the example requires `numpy<2` otherwise the following error can be seen: ``` AttributeError: module 'numpy' has no attribute 'bool8'. Did you mean: 'bool'? ``` Signed-off-by: Dmitry Rogozhkin <[email protected]> * Update torch requirement for time and word examples to be <2.6 Current version of examples require `torch<2.6` otherwise the following error can be seen: ``` File "/pytorch/examples/time_sequence_prediction/train.py", line 47, in <module> data = torch.load('traindata.pt') ^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/pytorch/examples/time_sequence_prediction/.venv/lib/python3.12/site-packages/torch/serialization.py", line 1524, in load raise pickle.UnpicklingError(_get_wo_message(str(e))) from None ``` Signed-off-by: Dmitry Rogozhkin <[email protected]> * Respect each example requirements and use uv This commit introduces few changes to CI by modifying `run_*_examples.sh` and respective github workflows: * Switched to uv * Added tearup and teardown stages for tests (`start()` and `stop()` methods wrapping up test bodies - these are called automatically) * Tearup (`start()`) installs example dependencies and, optionally (if `VIRTUAL_ENV=.venv` is passed), creates uv virtual environment * Teardown (`stop()`) removes uv virtual environment if it was created (to save space) * If no `VIRTUAL_ENV` set, then scripts expect to be executed in the existing virtual environment. These can be `python -m venv`, `uv env` or `conda env`. In this case example dependencies will be installed in this environment potentially reinstalling existing packages (including `torch`!). * Dropped automated detection of CUDA platform. Now scripts require `USE_CUDA=True` to be passed explicitly * Added `PIP_INSTALL_ARGS` environment variable to be passed to `uv pip install` calls for each example dependencies. This allows to adjust torch indices and other options. Execute all tests in current virtual environment (might rewrite packages): ``` ./run_distributed_examples.sh ``` Execute all tests creating separate environment for each example: ``` VIRTUAL_ENV=.venv ./run_distributed_examples.sh ``` Run with CUDA: ``` USE_CUDA=True ./run_distributed_examples.sh ``` Adjust index: ``` PIP_INSTALL_ARGS="--pre -f https://download.pytorch.org/whl/nightly/cpu/torch_nightly.html" \ ./run_distributed_examples.sh ``` Signed-off-by: Dmitry Rogozhkin <[email protected]> --------- Signed-off-by: Dmitry Rogozhkin <[email protected]>
1 parent 8393ceb commit 00ef8a7

File tree

13 files changed

+162
-127
lines changed

13 files changed

+162
-127
lines changed

.github/workflows/main_distributed.yaml

+6-4
Original file line numberDiff line numberDiff line change
@@ -22,12 +22,14 @@ jobs:
2222
with:
2323
python-version: 3.8
2424
- name: Install PyTorch
25-
run: |
26-
python -m pip install --upgrade pip
27-
pip install --pre torch -f https://download.pytorch.org/whl/nightly/cu118/torch_nightly.html
25+
uses: astral-sh/setup-uv@v6
2826
- name: Run Tests
27+
env:
28+
USE_CUDA: 'True'
29+
VIRTUAL_ENV: '.venv'
30+
PIP_INSTALL_ARGS: '--pre -f https://download.pytorch.org/whl/nightly/cu118/torch_nightly.html'
2931
run: |
30-
./run_distributed_examples.sh "run_all,clean"
32+
./run_distributed_examples.sh
3133
- name: Open issue on failure
3234
if: ${{ failure() && github.event_name == 'schedule' }}
3335
uses: rishabhgupta/git-action-issue@v2

.github/workflows/main_python.yml

+6-8
Original file line numberDiff line numberDiff line change
@@ -21,16 +21,14 @@ jobs:
2121
uses: actions/setup-python@v5
2222
with:
2323
python-version: '3.10'
24-
- name: Install PyTorch
25-
run: |
26-
python -m pip install --upgrade pip
27-
# Install CPU-based pytorch
28-
pip install --pre torch torchvision -f https://download.pytorch.org/whl/nightly/cpu/torch_nightly.html
29-
# Maybe use the CUDA 10.2 version instead?
30-
# pip install --pre torch torchvision -f https://download.pytorch.org/whl/nightly/cu102/torch_nightly.html
24+
- name: Install uv
25+
uses: astral-sh/setup-uv@v6
3126
- name: Run Tests
27+
env:
28+
VIRTUAL_ENV: '.venv'
29+
PIP_INSTAL_ARGS: '--pre -f https://download.pytorch.org/whl/nightly/cpu/torch_nightly.html'
3230
run: |
33-
./run_python_examples.sh "install_deps,run_all,clean"
31+
./run_python_examples.sh
3432
- name: Open issue on failure
3533
if: ${{ failure() && github.event_name == 'schedule' }}
3634
uses: rishabhgupta/git-action-issue@v2

CONTRIBUTING.md

+2-2
Original file line numberDiff line numberDiff line change
@@ -40,8 +40,8 @@ If you're new, we encourage you to take a look at issues tagged with [good first
4040
1. Fork the repo and create your branch from `main`.
4141
2. Make sure you have a GPU-enabled machine, either locally or in the cloud. `g4dn.4xlarge` is a good starting point on AWS.
4242
3. Make your code change.
43-
4. First, install all dependencies with `./run_python_examples.sh "install_deps"`.
44-
5. Then, make sure that `./run_python_examples.sh` passes locally by running the script end to end.
43+
4. Install `uv`.
44+
5. Then, make sure that `VIRTUAL_ENV=.venv ./run_python_examples.sh` passes locally by running the script end to end.
4545
6. If you haven't already, complete the Contributor License Agreement ("CLA").
4646
7. Address any feedback in code review promptly.
4747

fast_neural_style/requirements.txt

+3
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
numpy
2+
torch
3+
torchvision

fx/requirements.txt

+2
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,2 @@
1+
torch
2+
torchvision

regression/requirements.txt

+1
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
torch
+1-1
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
11
torch
2-
numpy
2+
numpy<2
33
gym
44
pygame

run_distributed_examples.sh

+29-17
Original file line numberDiff line numberDiff line change
@@ -4,16 +4,30 @@
44
# The purpose is just as an integration test, not to actually train models in any meaningful way.
55
# For that reason, most of these set epochs = 1 and --dry-run.
66
#
7-
# Optionally specify a comma separated list of examples to run.
8-
# can be run as:
9-
# ./run_python_examples.sh "install_deps,run_all,clean"
10-
# to pip install dependencies (other than pytorch), run all examples, and remove temporary/changed data files.
11-
# Expects pytorch, torchvision to be installed.
7+
# Optionally specify a comma separated list of examples to run. Can be run as:
8+
# * To run all examples:
9+
# ./run_distributed_examples.sh
10+
# * To run specific example:
11+
# ./run_distributed_examples.sh "distributed/tensor_parallelism,distributed/ddp"
12+
#
13+
# To test examples on CUDA accelerator, run as:
14+
# USE_CUDA=True ./run_distributed_examples.sh
15+
#
16+
# Script requires uv to be installed. When executed, script will install prerequisites from
17+
# `requirements.txt` for each example. If ran within activated virtual environment (uv venv,
18+
# python -m venv, conda) this might reinstall some of the packages. To change pip installation
19+
# index or to pass additional pip install options, run as:
20+
# PIP_INSTALL_ARGS="--pre -f https://download.pytorch.org/whl/nightly/cpu/torch_nightly.html" \
21+
# ./run_python_examples.sh
22+
#
23+
# To force script to create virtual environment for each example, run as:
24+
# VIRTUAL_ENV=".venv" ./run_distributed_examples.sh
25+
# Script will remove environments it creates in a teardown step after execution of each example.
1226

1327
BASE_DIR="$(pwd)/$(dirname $0)"
1428
source $BASE_DIR/utils.sh
1529

16-
USE_CUDA=$(python -c "import torch; print(torch.cuda.is_available())")
30+
USE_CUDA=${USE_CUDA:-False}
1731
case $USE_CUDA in
1832
"True")
1933
echo "using cuda"
@@ -30,21 +44,19 @@ case $USE_CUDA in
3044
;;
3145
esac
3246

33-
function distributed() {
34-
start
35-
bash tensor_parallelism/run_example.sh tensor_parallelism/tensor_parallel_example.py || error "tensor parallel example failed"
36-
bash tensor_parallelism/run_example.sh tensor_parallelism/sequence_parallel_example.py || error "sequence parallel example failed"
37-
bash tensor_parallelism/run_example.sh tensor_parallelism/fsdp_tp_example.py || error "2D parallel example failed"
38-
python ddp/main.py || error "ddp example failed"
47+
function distributed_tensor_parallelism() {
48+
uv run bash run_example.sh tensor_parallel_example.py || error "tensor parallel example failed"
49+
uv run bash run_example.sh sequence_parallel_example.py || error "sequence parallel example failed"
50+
uv run bash run_example.sh fsdp_tp_example.py || error "2D parallel example failed"
3951
}
4052

41-
function clean() {
42-
cd $BASE_DIR
43-
echo "running clean to remove cruft"
53+
function distributed_ddp() {
54+
uv run main.py || error "ddp example failed"
4455
}
4556

4657
function run_all() {
47-
distributed
58+
run distributed/tensor_parallelism
59+
run distributed/ddp
4860
}
4961

5062
# by default, run all examples
@@ -54,7 +66,7 @@ else
5466
for i in $(echo $EXAMPLES | sed "s/,/ /g")
5567
do
5668
echo "Starting $i"
57-
$i
69+
run $i
5870
echo "Finished $i, status $?"
5971
done
6072
fi

0 commit comments

Comments
 (0)