Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Potential improvement to the gradient accumulation code #13

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

igor-sikachyna
Copy link

@igor-sikachyna igor-sikachyna commented Aug 6, 2020

I recently became really involved in experimenting with StyleGAN2 and stumbled upon a problem that I feel like the way the gradient accumulation is implemented for small GPU batches is incorrect. The piece of code I am concerned is training/training_loop.py:

# Slow path with gradient accumulation.
else:
    for _round in rounds:
        tflib.run(G_train_op, feed_dict)
    if run_G_reg:
        for _round in rounds:
            tflib.run(G_reg_op, feed_dict)
    tflib.run(Gs_update_op, feed_dict)
    for _round in rounds:
        tflib.run(data_fetch_op, feed_dict)
        tflib.run(D_train_op, feed_dict)
    if run_D_reg:
        for _round in rounds:
            tflib.run(D_reg_op, feed_dict)

As a reference, here is a code without gradient accumulation:

tflib.run([G_train_op, data_fetch_op], feed_dict)
if run_G_reg:
    tflib.run(G_reg_op, feed_dict)
tflib.run([D_train_op, Gs_update_op], feed_dict)
if run_D_reg:
    tflib.run(D_reg_op, feed_dict)

So as for gradient accumulation case:

  1. G_train_op is repeated multiple times on the same data instead of taking new samples with data_fetch_op
  2. G_reg_op uses the same data as G_train_op (while looking at the code without gradient accumulation they call data_fetch_op between them)
  3. D_train_op has new data_fetch_op for each round which suggests that it should be the same for G_train_op
  4. The PR Potential bug in gradient accumulation? #9 suggests that D_reg_op is also misused as it requires new data via data_fetch_op

So I propose a simple update to gradient accumulation code code and ask for opinion on whether there are real issue with it in the first place?

@Thunder003
Copy link

Thunder003 commented Feb 25, 2021

Hi @igor-sikachyna , do you got the solution to the question you raised? Also, did you take a look at the optimizer file here, at line 228? They haven't added tensor for this in the graph. Then how gradients will be updated when tflib.run(G_train_op, feed_dict) will execute? Any idea on this?

@johndpope
Copy link

also check stylegan2-ada this was more recent cut - 2020.
https://github.com/NVlabs/stylegan2-ada
now superceded by
https://github.com/NVlabs/stylegan2-ada-pytorch

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants