Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GPU Memory Increases in Differentiable A* for loop #28

Open
BrandonGel opened this issue Jan 12, 2025 · 4 comments
Open

GPU Memory Increases in Differentiable A* for loop #28

BrandonGel opened this issue Jan 12, 2025 · 4 comments

Comments

@BrandonGel
Copy link

Hello,

I have been using neural-astar with my dataset (300 img, 200x400 pixel). My GPU memory keeps increasing with the number of epochs. I believe the culprit is in PlannerModule at src/neural_astar/utils/training.py. In the self.log, you are passing the loss value, so no GPU memory get released, and gradient backflow still exists.

@yonetaniryo
Copy link
Collaborator

Thank you for the report!

and you are right it should be loss.item() instead of loss itself. Can you check if doing so resolve the issue?

@BrandonGel BrandonGel changed the title GPU Memory Leak in PlannerModule GPU Memory Leak Jan 13, 2025
@BrandonGel
Copy link
Author

I tried that. It did help but did not resolve the issue. It seems that most of the GPU accumulation is from the for loop in the forward() of DifferentiableAstar. I noticed that for solutions with large paths or multiple obstacles (very large t in the for loop), GPU starts to increase. I am not sure why. My temporary solution is to add torch.cuda.empty_cache() after the for loop.

 File "/home/bho36/miniconda3/envs/graphmotionplanner/lib/python3.10/site-packages/neural_astar/planner/astar.py", line 63, in perform_astar
    astar_outputs = astar(
  File "/home/bho36/miniconda3/envs/graphmotionplanner/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/bho36/miniconda3/envs/graphmotionplanner/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/bho36/miniconda3/envs/graphmotionplanner/lib/python3.10/site-packages/neural_astar/planner/differentiable_astar.py", line 234, in forward
    g2 = expand((g + cost_maps) * selected_node_maps, neighbor_filter)
  File "/home/bho36/miniconda3/envs/graphmotionplanner/lib/python3.10/site-packages/neural_astar/planner/differentiable_astar.py", line 91, in expand
    y = F.conv2d(x, neighbor_filter, padding=1, groups=num_samples).squeeze()
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 20.00 MiB. GPU

@BrandonGel
Copy link
Author

So I think I understand my issue. My increased in GPU memory is due to the for loop, specifically this line:
histories = histories + selected_node_maps
Correct me if I am wrong. With every incremental loop in the for loop, the backprop graph of histories gets larger and larger, requiring more GPU memory. This is a limitation for Neural A* where you can explore as much in larger map sizes as you need more GPU.

@BrandonGel BrandonGel changed the title GPU Memory Leak GPU Memory Increase in Differentiable A* for loop Jan 13, 2025
@BrandonGel BrandonGel changed the title GPU Memory Increase in Differentiable A* for loop GPU Memory Increases in Differentiable A* for loop Jan 13, 2025
@yonetaniryo
Copy link
Collaborator

Yes. that's indeed intended. Backprops through search steps is a key feature of neural A* for its high performances, and is also a limitation when working on large maps.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants