Potential improvement to the gradient accumulation code #13

igor-sikachyna · 2020-08-06T08:38:51Z

I recently became really involved in experimenting with StyleGAN2 and stumbled upon a problem that I feel like the way the gradient accumulation is implemented for small GPU batches is incorrect. The piece of code I am concerned is training/training_loop.py:

# Slow path with gradient accumulation.
else:
    for _round in rounds:
        tflib.run(G_train_op, feed_dict)
    if run_G_reg:
        for _round in rounds:
            tflib.run(G_reg_op, feed_dict)
    tflib.run(Gs_update_op, feed_dict)
    for _round in rounds:
        tflib.run(data_fetch_op, feed_dict)
        tflib.run(D_train_op, feed_dict)
    if run_D_reg:
        for _round in rounds:
            tflib.run(D_reg_op, feed_dict)

As a reference, here is a code without gradient accumulation:

tflib.run([G_train_op, data_fetch_op], feed_dict)
if run_G_reg:
    tflib.run(G_reg_op, feed_dict)
tflib.run([D_train_op, Gs_update_op], feed_dict)
if run_D_reg:
    tflib.run(D_reg_op, feed_dict)

So as for gradient accumulation case:

G_train_op is repeated multiple times on the same data instead of taking new samples with data_fetch_op
G_reg_op uses the same data as G_train_op (while looking at the code without gradient accumulation they call data_fetch_op between them)
D_train_op has new data_fetch_op for each round which suggests that it should be the same for G_train_op
The PR Potential bug in gradient accumulation? #9 suggests that D_reg_op is also misused as it requires new data via data_fetch_op

So I propose a simple update to gradient accumulation code code and ask for opinion on whether there are real issue with it in the first place?

Thunder003 · 2021-02-25T08:22:25Z

Hi @igor-sikachyna , do you got the solution to the question you raised? Also, did you take a look at the optimizer file here, at line 228? They haven't added tensor for this in the graph. Then how gradients will be updated when tflib.run(G_train_op, feed_dict) will execute? Any idea on this?

johndpope · 2021-02-25T17:46:38Z

also check stylegan2-ada this was more recent cut - 2020.
https://github.com/NVlabs/stylegan2-ada
now superceded by
https://github.com/NVlabs/stylegan2-ada-pytorch

Updated gradient accumulation code

c79ec82

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Potential improvement to the gradient accumulation code #13

Potential improvement to the gradient accumulation code #13

igor-sikachyna commented Aug 6, 2020 •

edited

Loading

Thunder003 commented Feb 25, 2021 •

edited

Loading

johndpope commented Feb 25, 2021

Potential improvement to the gradient accumulation code #13

Are you sure you want to change the base?

Potential improvement to the gradient accumulation code #13

Conversation

igor-sikachyna commented Aug 6, 2020 • edited Loading

Thunder003 commented Feb 25, 2021 • edited Loading

johndpope commented Feb 25, 2021

igor-sikachyna commented Aug 6, 2020 •

edited

Loading

Thunder003 commented Feb 25, 2021 •

edited

Loading