This setup provides an example of a diagnostic computation of ice flow (check at diagnostic.py for more details).
Warning: This is very much work-in-progress, the setup should serve as a basis to improve the performance of the emulator in terms of fidelity to computational cost.
By activating the "diagnostic" mode of the iceflow (as opposed to the default modes "emulated" and "solved"), IGM operates similarly to the default "emulated" mode. However, every 10 years, the "solver" is used to assess the misfit between the emulated and the solved solutions in order to compute a fidelity score.
As a result, the code generates a file named errors_diagno.txt, which logs the fidelity assessment every 10 years.
time,l1,l2,COST_Glen,COST_Emulator,nb_it_solve,nb_it_emul,training_strenght,vol
2000.000, 2.135, 2.994, -0.120, -0.116, 36.000, 1000.000, 0.683, 1.503
2010.000, 1.712, 2.614, -0.083, -0.079, 32.000, 10.000, 0.440, 1.495
2020.000, 1.953, 3.836, -0.090, -0.084, 32.000, 10.000, 0.572, 1.527
2030.000, 2.203, 5.612, -0.104, -0.094, 42.000, 10.000, 0.728, 1.592
Where:
-
l1andl2represent the L1 and L2 misfit between the emulated and solved solutions, -
COST_GlenandCOST_Emulatorindicate the Glen functional cost achieved by the solver and emulator, respectively, -
nb_it_solveandnb_it_emulrecord the number of iterations invested in both the solver and emulator.
The idea is to implement a training_strength mechanism (currently commented out in diagnostic.py). This feature would adapt the intensity of retraining based on the misfit error: high error would trigger stronger retraining, while low error would call for reduced retraining effort.
#training_strenght = 15 * (l1 / 10)**2
#training_strenght = np.clip(training_strenght, 0.1, 10.0)
#cfg.processes.iceflow.emulator.retrain_freq = int(1/min(training_strenght,1))
#cfg.processes.iceflow.emulator.nbit = int(max(training_strenght,1))
then the retraining frequency retrain_freq as well as the number of retraining per time step would be computed from training_strenght as shown before.
Important parameters to play with :
- network architecture parameters
- training parameters, e.g. here the amount of retraining is very high (10 iterations each step)
retrain_freq: 1
nbit: 10
lr: 0.001
- (Experimental – see code and parameter help for details): Perturbations are used during training to retrain not only at the current state but also at states slightly perturbed around it. This helps better explore the neighborhood of the current state and improves generalization. Additionally, the
split_patch_method, when used in combination withframesizemax, splits the computational domain into smaller patches (resulting in a batch_size > 1). This strategy appears to improve and stabilize the convergence of the retraining process.
pertubate: false
framesizemax: 300
split_patch_method: parrallel
The final goal is to make a proper experiment over longer time with oscilating ELA, improve the performance of the emulator, some possible strategies for that:
- Patching or pertubation in the training as shown above
- Use 2nd order optimizer
- Test other Ml architectures (e.g. ligher : cheaper but need more retraining)
- ...