-
Notifications
You must be signed in to change notification settings - Fork 15.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reference Fine-Tuning Code #498
Comments
Is your feature request related to a problem? Please describe. It would be invaluable if there could be fine-tuning code or a basic example provided for DeepSeek V3/R1. Even a simplistic version would go a long way in helping others in the community build upon it. MoE models often require specific adjustments, and having a working starting point or references for fine-tuning could save a lot of time and effort. It’s also worth noting that the community could benefit from any lessons learned regarding issues specific to the MoE structure. |
looking for Fine-Tuning Code +1 |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. If you believe this issue is still relevant, please leave a comment to keep it open. Thank you for your contributions! |
looking for Fine-Tuning Code +1 |
Is your feature request related to a problem? Please describe.
I am interested in fine-tuning DeepSeek V3/R1.
Describe the solution you'd like
It would be great to provide the fine-tuning code, even if it's simplistic, it would be invaluable reference for others to build upon.
MoEs have historically been tricky to fine-tune correctly (and in the case of some older MoE models, it took the community months to figure out all the bugs in the HF implementation).
The text was updated successfully, but these errors were encountered: