-
Notifications
You must be signed in to change notification settings - Fork 97
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Can you add instructions for using distillation datasets? #61
Comments
Hi @ardagoreci |
@cloverzizi , thank you for your response! |
Hi @ardagoreci , During training, we begin by selecting a dataset (WeightedPDB or a specific Distillation dataset), and then draw a sample from it. This process is repeated for each sample. For the Distillation datasets, we first clustered the sequences intended for distillation, choosing the cluster centers to create our Distillation Dataset, thereby ensuring diversity. Thus, when a Distillation dataset is selected during training, samples are drawn with equal weights, so there's no need to provide a |
Hello,
The docs provided are very helpful for preparing PDB data, but there is no information in the docs about how to prepare the training examples from the AlphaFold database, which comprises 50% of the training set. Could you add instructions for preparing the AlphaFold cross-distillation dataset?
Sincerely,
Arda
The text was updated successfully, but these errors were encountered: