Memory Growth and Unexpected Behaviour in Firedrake Adjoint #4014

sghelichkhani · 2025-02-07T05:27:12Z

We are debugging a large-scale Stokes optimisation problem that eventually runs out of memory (g-adopt/g-adopt#160). Since we are dealing with millions of degrees of freedom, we rely on checkpointing to disk to manage memory. While testing smaller reproducer cases, we see unexpected memory growth even after the tape is generated and throughout forward and backward passes.

What We Expected vs. What’s Happening
• Expected: Once the tape is populated, and after the first calls to ReducedFunctional.__call__ and ReducedFunctional.derivative, memory usage should stay constant.
• Actual: Memory keeps increasing with every forward and derivative call and steadily.
• In our minimal reproducer, checkpointing to disk actually increases memory usage!!!

Minimal Reproducer

Using memory_profile I profile repeated calls to ReducedFunctional.__call__ and ReducedFunctional.derivative. Simply run mprof run ...

Code for Reproduction

from firedrake import *
from firedrake.adjoint import *
import gc

def test():
    T_c, rf = rf_generator()

    for i in range(5):
        gc.collect()
        rf.__call__(T_c)
        gc.collect()
        rf.derivative()

def rf_generator():
    tape = get_working_tape()
    tape.clear_tape()
    continue_annotation()
    enable_disk_checkpointing()
    
    mesh = RectangleMesh(100, 100, 1.0, 1.0)
    mesh = checkpointable_mesh(mesh)

    V = VectorFunctionSpace(mesh, "CG", 2)
    Q = FunctionSpace(mesh, "CG", 1)

    X = SpatialCoordinate(mesh)
    w = Function(V, name="rotation").interpolate(as_vector([-X[1] - 0.5, X[0] - 0.5]))
    T_c = Function(Q, name="control")
    T = Function(Q, name="Temperature")
    
    T_c.interpolate(0.1 * exp(-0.5 * ((X - as_vector((0.75, 0.5))) / Constant(0.1)) ** 2))
    control = Control(T_c)
    T.assign(T_c)

    for i in range(20):
        T.interpolate(T + inner(grad(T), w) * Constant(0.0001))

    objective = assemble(T**2 * dx)

    pause_annotation()
    return T_c, ReducedFunctional(objective, control)

if __name__ == "__main__":
    test()

Without checkpoint to disk:

with checkpointing to disk:

Am I missing here or there is an actual leak here, specially when checkpointing to disk?

The text was updated successfully, but these errors were encountered:

connorjward · 2025-02-07T08:01:11Z

@Ig-dolci, you have done a lot of work investigating this sort of thing. Do you have any suggestions?

colinjcotter · 2025-02-07T08:08:42Z

Does it still happen if you replace interpolate with project (which uses a solver).

Ig-dolci · 2025-02-07T10:09:39Z

I will check that.

sghelichkhani · 2025-02-07T11:52:06Z

Thanks @Ig-dolci for looking into this. @colinjcotter same behaviour with project. The reproducer is this one https://github.com/g-adopt/g-adopt/blob/adjoint-memory/demos/mantle_convection/test/tester.py

Without checkpointing to disk:

with checkpointing to disk:

The gadopt problem we are seeing this in is this one https://github.com/g-adopt/g-adopt/blob/adjoint-memory/demos/adjoint_spherical/adjoint.py. Basically time-stepping through a stokes problem. So almost only solves with a few projections.

sghelichkhani added the bug label Feb 7, 2025

Ig-dolci self-assigned this Feb 7, 2025

Ig-dolci linked a pull request Feb 8, 2025 that will close this issue

Fix memory leak for disk checkpointing #4020

Open

Ig-dolci linked a pull request Feb 10, 2025 that will close this issue

Fix memory leak for disk checkpointing #4020

Open

angus-g mentioned this issue Feb 12, 2025

Remove reference cycle in VecAccessMixin #4033

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Memory Growth and Unexpected Behaviour in Firedrake Adjoint #4014

Memory Growth and Unexpected Behaviour in Firedrake Adjoint #4014

sghelichkhani commented Feb 7, 2025

connorjward commented Feb 7, 2025

colinjcotter commented Feb 7, 2025

Ig-dolci commented Feb 7, 2025

sghelichkhani commented Feb 7, 2025

Memory Growth and Unexpected Behaviour in Firedrake Adjoint #4014

Memory Growth and Unexpected Behaviour in Firedrake Adjoint #4014

Comments

sghelichkhani commented Feb 7, 2025

connorjward commented Feb 7, 2025

colinjcotter commented Feb 7, 2025

Ig-dolci commented Feb 7, 2025

sghelichkhani commented Feb 7, 2025