Could you clarify the role of 1 / t.linalg.norm(s2) * map_norms in the model evidence formula? Specifically, how does the normalization by t.linalg.norm(s2) affect the contribution of map_norms to the overall model evidence?
When maximizing the marginal likelihood, should we include a negative sign in front of the model evidence term to properly adjust for the optimization algorithm’s minimization objective? If not, could you clarify how the model evidence is being handled in the optimization process?
map_norms = 0.0
lora_params = {
k: v
for k, v in dict(model.named_parameters()).items()
if "lora" in k.lower() and v.requires_grad
}
for i, param in enumerate(lora_params.values()):
map_norms += t.linalg.norm(param)
model_evidence = LL + 1 / t.linalg.norm(s2) * map_norms + 0.5 * logdet
Could you clarify the role of 1 / t.linalg.norm(s2) * map_norms in the model evidence formula? Specifically, how does the normalization by t.linalg.norm(s2) affect the contribution of map_norms to the overall model evidence?
When maximizing the marginal likelihood, should we include a negative sign in front of the model evidence term to properly adjust for the optimization algorithm’s minimization objective? If not, could you clarify how the model evidence is being handled in the optimization process?