-
Notifications
You must be signed in to change notification settings - Fork 2.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
请问可以发布更多关于数据清洗的细节吗? #7
Labels
research question
Research questions.
Comments
之后会有技术报告出来的 |
|
We also are closely paying attention to how to preprocessing code dataset, especially how to handle the dependencies among code file |
|
请问技术报告里包含SFT数据的构造方法吗,以及SFT数据是否开源?顺便问一下技术报告什么时候能出来,很期待👍 |
请问这部分内容现在有更新吗? |
ding 一个 |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
目前第一步数据清洗是与starcoder相同,想学习了解后面是如何过滤掉低质量代码、语法错误或可读性差的代码的。
谢谢!
The text was updated successfully, but these errors were encountered: