training hyper-parameters for ablation studies #489

ethanhe42 · 2025-01-30T17:56:50Z

Thanks for the great work. Could you share the training hyper-parameters for 16B and 236B ablation studies? specifically learning rate schedule, batch size schedule, maximum sequence length, bias update speed, etc.

github-actions · 2025-03-02T00:18:05Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. If you believe this issue is still relevant, please leave a comment to keep it open. Thank you for your contributions!

github-actions · 2025-03-20T00:16:51Z

false

github-actions bot added the stale label Mar 2, 2025

github-actions bot added the closed-as-stale label Mar 20, 2025

github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Mar 20, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

training hyper-parameters for ablation studies #489

training hyper-parameters for ablation studies #489

ethanhe42 commented Jan 30, 2025

github-actions bot commented Mar 2, 2025

github-actions bot commented Mar 20, 2025

training hyper-parameters for ablation studies #489

training hyper-parameters for ablation studies #489

Comments

ethanhe42 commented Jan 30, 2025

github-actions bot commented Mar 2, 2025

github-actions bot commented Mar 20, 2025