Hi there! I tried to train the OthelloGPT on slurm: srun --partition=gpunodes --mem-per-gpu=13G --time=7200 --gpus=8 python train_gpt_othello.py
However, I still got oom error on one of the GPUs:

I am still getting familiar with slurm, so I wonder why this could happen.
Hi there! I tried to train the OthelloGPT on slurm:
srun --partition=gpunodes --mem-per-gpu=13G --time=7200 --gpus=8 python train_gpt_othello.pyHowever, I still got oom error on one of the GPUs:

I am still getting familiar with slurm, so I wonder why this could happen.