GPU Memory Increases in Differentiable A* for loop #28

BrandonGel · 2025-01-12T18:39:30Z

Hello,

I have been using neural-astar with my dataset (300 img, 200x400 pixel). My GPU memory keeps increasing with the number of epochs. I believe the culprit is in PlannerModule at src/neural_astar/utils/training.py. In the self.log, you are passing the loss value, so no GPU memory get released, and gradient backflow still exists.

The text was updated successfully, but these errors were encountered:

yonetaniryo · 2025-01-12T23:59:01Z

Thank you for the report!

and you are right it should be loss.item() instead of loss itself. Can you check if doing so resolve the issue?

BrandonGel · 2025-01-13T02:02:57Z

I tried that. It did help but did not resolve the issue. It seems that most of the GPU accumulation is from the for loop in the forward() of DifferentiableAstar. I noticed that for solutions with large paths or multiple obstacles (very large t in the for loop), GPU starts to increase. I am not sure why. My temporary solution is to add torch.cuda.empty_cache() after the for loop.

 File "/home/bho36/miniconda3/envs/graphmotionplanner/lib/python3.10/site-packages/neural_astar/planner/astar.py", line 63, in perform_astar
    astar_outputs = astar(
  File "/home/bho36/miniconda3/envs/graphmotionplanner/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/bho36/miniconda3/envs/graphmotionplanner/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/bho36/miniconda3/envs/graphmotionplanner/lib/python3.10/site-packages/neural_astar/planner/differentiable_astar.py", line 234, in forward
    g2 = expand((g + cost_maps) * selected_node_maps, neighbor_filter)
  File "/home/bho36/miniconda3/envs/graphmotionplanner/lib/python3.10/site-packages/neural_astar/planner/differentiable_astar.py", line 91, in expand
    y = F.conv2d(x, neighbor_filter, padding=1, groups=num_samples).squeeze()
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 20.00 MiB. GPU

BrandonGel · 2025-01-13T06:47:02Z

So I think I understand my issue. My increased in GPU memory is due to the for loop, specifically this line:
histories = histories + selected_node_maps
Correct me if I am wrong. With every incremental loop in the for loop, the backprop graph of histories gets larger and larger, requiring more GPU memory. This is a limitation for Neural A* where you can explore as much in larger map sizes as you need more GPU.

yonetaniryo · 2025-01-13T23:42:13Z

Yes. that's indeed intended. Backprops through search steps is a key feature of neural A* for its high performances, and is also a limitation when working on large maps.

BrandonGel changed the title ~~GPU Memory Leak in PlannerModule~~ GPU Memory Leak Jan 13, 2025

BrandonGel changed the title ~~GPU Memory Leak~~ GPU Memory Increase in Differentiable A* for loop Jan 13, 2025

BrandonGel changed the title ~~GPU Memory Increase in Differentiable A* for loop~~ GPU Memory Increases in Differentiable A* for loop Jan 13, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GPU Memory Increases in Differentiable A* for loop #28

GPU Memory Increases in Differentiable A* for loop #28

BrandonGel commented Jan 12, 2025

yonetaniryo commented Jan 12, 2025

BrandonGel commented Jan 13, 2025

BrandonGel commented Jan 13, 2025

yonetaniryo commented Jan 13, 2025

GPU Memory Increases in Differentiable A* for loop #28

GPU Memory Increases in Differentiable A* for loop #28

Comments

BrandonGel commented Jan 12, 2025

yonetaniryo commented Jan 12, 2025

BrandonGel commented Jan 13, 2025

BrandonGel commented Jan 13, 2025

yonetaniryo commented Jan 13, 2025