bugfix(env): fix accumulate batches, gpu list #65

megatran · 2023-06-23T16:07:38Z

The current code has some typos and mismatched python/bash env variables, which cause this exception.

    raise TypeError("Gradient accumulation supports only int and dict types")
TypeError: Gradient accumulation supports only int and dict types

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "main.py", line 939, in <module>
    if trainer.global_rank == 0:
NameError: name 'trainer' is not defined

I propose this current fix

# 2xA6000:
BATCH_SIZE = 4
N_GPUS = 2
ACCUMULATE_BATCHES = 1
GPU_LIST = ",".join((str(x) for x in range(N_GPUS))) + ","
print(f"Using GPUs: {GPU_LIST}")

import os
os.environ["BATCH_SIZE"] = str(BATCH_SIZE) 
os.environ["N_GPUS"] = str(N_GPUS) 
os.environ["ACCUMULATE_BATCHES"] = str(ACCUMULATE_BATCHES)
os.environ["GPU_LIST"] = GPU_LIST
os.environ["CKPT_PATH"] = ckpt_path

!echo "$BATCH_SIZE"
!echo "$N_GPUS"
!echo "$ACCUMULATE_BATCHES"
!echo "$GPU_LIST"
!echo "$CKPT_PATH"

# Run training
!(python main.py \
    -t \
    --base configs/stable-diffusion/pokemon.yaml \
    --gpus "$GPU_LIST" \
    --scale_lr False \
    --num_nodes 1 \
    --check_val_every_n_epoch 10 \
    --finetune_from "$CKPT_PATH" \
    data.params.batch_size="$BATCH_SIZE" \
    lightning.trainer.accumulate_grad_batches="$ACCUMULATE_BATCHES" \
    data.params.validation.params.n_gpus="$N_GPUS" \
)

Now everything works!

…h mixing" This reverts commit 65742da.

bugfix(env): fix accumulate batches, gpu list, and python/bash mixing

65742da

This was referenced Jun 23, 2023

NameError: name 'trainer' is not defined #61

Open

Invalid --gpus argument #27

Open

megatran added 3 commits June 23, 2023 12:17

bugfix(env): fix accumulate batches, gpu list, and python/bash mixing

d21b142

Revert "bugfix(env): fix accumulate batches, gpu list, and python/bas…

43e80a5

…h mixing" This reverts commit 65742da.

bugfix(env): fix accumulate batches, gpu list

2508f36

Gamartinez25 pushed a commit to Gamartinez25/examples that referenced this pull request May 25, 2025

Test LambdaLabsML#65

817b7b2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

bugfix(env): fix accumulate batches, gpu list #65

bugfix(env): fix accumulate batches, gpu list #65

Uh oh!

megatran commented Jun 23, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

bugfix(env): fix accumulate batches, gpu list #65

Are you sure you want to change the base?

bugfix(env): fix accumulate batches, gpu list #65

Uh oh!

Conversation

megatran commented Jun 23, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant