You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When we use larger model (e.g. VGG19) and larger batch size (e.g. 256), the original version of _update_fisher_params will easily deplete the GPU memory (over 24GB). Here, I propose a practical improved version.
To obtain sum of squared gradients instead of squared sum of gradients, we need to compute the gradient of log likelihood wrt each parameter. This process requires a sequential execution on each data sample, since we cannot use the reduce_mean or reduce_sum commonly adopted in loss design.
Here, I adopt autograd.functional.jacobian() to compute the Jacobian matrix in parallel. As this function is usually used to compute the gradients wrt the inputs instead of neural network parameters, we need to construct a forward function using parameters as a pseudo input. I learned about this trick from here.
When we use larger model (e.g. VGG19) and larger batch size (e.g. 256), the original version of
_update_fisher_params
will easily deplete the GPU memory (over 24GB). Here, I propose a practical improved version.The text was updated successfully, but these errors were encountered: