You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hey, I'm running Wilds on a p2.8xlarge AWS EC2 instance with 8 K80 GPUs. I noticed that when I try to run run_expt.py and use the --device argument to divide the jobs I'm trying to run between the GPUs, they all end up running on GPU 0. I verified this by the memory usage in nvidia-smi as well as printing the device used by torch using torch.cuda.current_device(). My guess is that the CUDA_VISIBLE_DEVICES environment variable, set here, is set too late and PyTorch just defaults to device 0.
I've worked around this by setting the CUDA_VISIBLE_DEVICES variable manually, before running the script. I just thought I'd let you know I encountered this issue.
Really appreciate the project by the way! Being able to access multiple datasets for domain generalization with the same interface is really useful, and I managed to use run_expt pretty easily to run my own experiments.
The text was updated successfully, but these errors were encountered:
Hey, I'm running Wilds on a
p2.8xlarge
AWS EC2 instance with 8 K80 GPUs. I noticed that when I try to runrun_expt.py
and use the--device
argument to divide the jobs I'm trying to run between the GPUs, they all end up running on GPU 0. I verified this by the memory usage innvidia-smi
as well as printing the device used by torch usingtorch.cuda.current_device()
. My guess is that theCUDA_VISIBLE_DEVICES
environment variable, set here, is set too late and PyTorch just defaults to device 0.I've worked around this by setting the
CUDA_VISIBLE_DEVICES
variable manually, before running the script. I just thought I'd let you know I encountered this issue.Really appreciate the project by the way! Being able to access multiple datasets for domain generalization with the same interface is really useful, and I managed to use
run_expt
pretty easily to run my own experiments.The text was updated successfully, but these errors were encountered: