Skip to content

CI does not test examples with latest pytorch #1329

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
3 of 4 tasks
dvrogozh opened this issue Apr 24, 2025 · 5 comments
Open
3 of 4 tasks

CI does not test examples with latest pytorch #1329

dvrogozh opened this issue Apr 24, 2025 · 5 comments

Comments

@dvrogozh
Copy link
Contributor

dvrogozh commented Apr 24, 2025

With: 8393ceb

This issue is a follow up on #1327 (comment). We've tried to update fast_neural_style example which required bumping up pytorch version and spotted few issues. Findings are around the fact that CI scripts run bulk install of dependencies for all examples at once. See:

examples/utils.sh

Lines 26 to 31 in 8393ceb

cat $BASE_DIR/*/requirements.txt | \
sort -u | \
# testing the installed version of torch, so don't pip install it.
grep -vE '^torch$' | \
pip install -r /dev/stdin || \
{ error "failed to install dependencies"; exit 1; }

This causes downgrade of nightly torch installed by the CI:

I suggest to consider the following improvements for pytorch examples:

  • Each example must have requirements.txt (some miss it as fast_neural_style does), Respect each example requirements and use uv #1330
  • CI scripts must not bulk install dependencies, but respect each example individual requirements.txt, Respect each example requirements and use uv #1330
  • (to discuss) CI to prepare environment for each example from scratch instead of installing next example environment on top of previous one, Respect each example requirements and use uv #1330
    • note: this requires significant change to current scripts and requires discussion
  • Examples must not pin version of pytorch packages (torch, torchvision, etc.) unless as a workaround to specific issues (which must be explicitly noted) or due to example deprecation with the future drop

Alternatively, we can consider that (UPDATE: we've dismissed this after discussion):

  • All examples must comply to the single dependency list supported on the pytorch examples repo top level

CC: @malfet, @atalman, @msaroufim

@msaroufim
Copy link
Member

msaroufim commented Apr 24, 2025

Hi @dvrogozh I think this sounds very reasonable, we could potentially just create a new venv per example, so insisting on a requirements.txt per folder sounds great, we could explore either regular venv or uv or really anything you think is most convenient

I'm open to designs and PR ideas here, probably won't have time to author myself but happy to shepherd PRs

@dvrogozh
Copy link
Contributor Author

@msaroufim : thank you for support. I will be glad to suggest some PRs. Would be nice to agree on direction we want to take before I will go ahead with implementation. Couple things to decide upon:

  1. Global vs. per-example requirements.txt.
  2. From scratch environment for each example vs. install on top of prev. sample

Global vs. per-example requirements.txt

I think per-example makes more sense since likely usage is git-clone/cd-to-example/run-example which is per-example approach. This however gives some complexity to implement CI side since we need to care for each example.

From scratch environment for each example vs. install on top of prev. sample

On technical level this implies other question: do we need run_python_examples.sh and other such scripts? From one side they are convenient to run all the examples locally. From other side they complicate managing clean environment for each sample. Potentially we can:

  1. Drop these scripts and deal with running each example directly in the .github/workflows/.
  2. Or we can keep using run_python_examples.sh and bring some complexity into it managing environments inside.
  3. We can also do both: for ci stop using these run-*.sh scripts and just do everything in.github/workflows and minimize run-*.sh scripts to, for example, running all the samples in the current user environment not installing anything at all.

I personally like 1st option (drop *.sh scripts), but I can see that others might wish to keep them.

@msaroufim
Copy link
Member

I agree that per example requirements.txt makes sense so there's no implicit downgrade or upgrade

I do think separate venv still makes the most sense to be honest and with uv its easier because you can put the venv in the same folder as code

As far as run_python_examples.sh goes, that file is not my favorite the key requirements though are

  1. if someone wants to run a given example locally easily they should be able to, otherwise contributions will go to 0
  2. We need a github workflow that is not too long and wouldn't deviate in unexpected ways from how to run the instructions locally

So for example if the local instructions are

  1. Activate venv
  2. Run main.py

Then the CI job can for loop over all the directories in the github workflow

@dvrogozh
Copy link
Contributor Author

@msaroufim : ok, I've implemented something which we can review and discuss:

@dvrogozh
Copy link
Contributor Author

After merging #1330, the remainder task is:

  • "Examples must not pin version of pytorch packages <...>"

For few examples this might be as simple as unpin versions and revalidating with latest packages:

$ fgrep -rsn torchvision | grep "=="
mnist_hogwild/requirements.txt:2:torchvision==0.20.0
mnist_forward_forward/requirements.txt:2:torchvision==0.20.0
dcgan/requirements.txt:2:torchvision==0.20.0
gcn/requirements.txt:2:torchvision==0.20.0
imagenet/requirements.txt:2:torchvision==0.20.0
vae/requirements.txt:2:torchvision==0.20.0
mnist/requirements.txt:2:torchvision==0.20.0
siamese_network/requirements.txt:2:torchvision==0.20.0
mnist_rnn/requirements.txt:2:torchvision==0.20.0
distributed/rpc/batch/requirements.txt:2:torchvision==0.7.0
distributed/rpc/pipeline/requirements.txt:2:torchvision==0.7.0

For few examples, however, this means updating example scripts as they rely on some features from torch<=2.5.

$ fgrep -rsn torch | grep txt | grep "<2"
time_sequence_prediction/requirements.txt:1:torch<2.6
word_language_model/requirements.txt:1:torch<2.6

Sorry, something went wrong.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants