You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We are debugging a large-scale Stokes optimisation problem that eventually runs out of memory (g-adopt/g-adopt#160). Since we are dealing with millions of degrees of freedom, we rely on checkpointing to disk to manage memory. While testing smaller reproducer cases, we see unexpected memory growth even after the tape is generated and throughout forward and backward passes.
What We Expected vs. What’s Happening
• Expected: Once the tape is populated, and after the first calls to ReducedFunctional.__call__ and ReducedFunctional.derivative, memory usage should stay constant.
• Actual: Memory keeps increasing with every forward and derivative call and steadily.
• In our minimal reproducer, checkpointing to disk actually increases memory usage!!!
Minimal Reproducer
Using memory_profile I profile repeated calls to ReducedFunctional.__call__ and ReducedFunctional.derivative. Simply run mprof run ...
We are debugging a large-scale Stokes optimisation problem that eventually runs out of memory (g-adopt/g-adopt#160). Since we are dealing with millions of degrees of freedom, we rely on checkpointing to disk to manage memory. While testing smaller reproducer cases, we see unexpected memory growth even after the tape is generated and throughout forward and backward passes.
What We Expected vs. What’s Happening
• Expected: Once the tape is populated, and after the first calls to
ReducedFunctional.__call__
andReducedFunctional.derivative
, memory usage should stay constant.• Actual: Memory keeps increasing with every forward and derivative call and steadily.
• In our minimal reproducer, checkpointing to disk actually increases memory usage!!!
Minimal Reproducer
Using memory_profile I profile repeated calls to
ReducedFunctional.__call__
andReducedFunctional.derivative
. Simply runmprof run ...
Code for Reproduction
Without checkpoint to disk:
data:image/s3,"s3://crabby-images/36f9e/36f9e0d2e9c60b56e702c19150b13cdbe462099b" alt="Image"
with checkpointing to disk:
Am I missing here or there is an actual leak here, specially when checkpointing to disk?
The text was updated successfully, but these errors were encountered: