fine-tuning paradigms #4
shahpnmlab
started this conversation in
Ideas
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
progressive lrs (discriminative learning rates) doesnt result in good training results. to improve outcomes, it is necessary to train the network with as much data as possible and then perform fine-tuning. this typer of transfer learning reequires a lot of data to bridge domain gaps. try lowering the lr by lr(n-1)=lr/2.6 factor (https://paperswithcode.com/method/discriminative-fine-tuning). also is it worth replacing the last layer with random weghts before finetuning?
NOTE: transfer learning using the knowledge distillation helps with bridging domain gaps. the way to effectively train such a model is to have a big teacher model (more params) and a smaller student model that is then used for domain specific tasks (https://intellabs.github.io/distiller/knowledge_distillation.html). thus the TODO here would be for me to train a much larger model with as much data as I can get my hands on and then perform fine-tuning of a smaller model and measure the performance.
Beta Was this translation helpful? Give feedback.
All reactions