You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Holdout F1 tends to be substantially lower than cross-validated F1, and we want to understand why so that we can make a better model for the holdout set.
Things we might try:
Split off an internal test set from the PreFer training data, and examine how F1 changes between CV score and test set score. Alternatively, look at consistency in F1 across CV folds.
Study the consistency of feature selection across folds. Should we restrict the number of features in order to have more consistently in F1 score from CV to holdout?
Should we restrict the model flexibility in order to have more consistently in F1 score from CV to holdout?
Once we've done some internal experimentation, we will take up Lisa and Gert's offer to let us test a simple model on the holdout set.
We are going to focus on the monkeys paper for now. We will return to these ideas later and decide whether/how much to pursue them depending how much time we have available.
The text was updated successfully, but these errors were encountered:
Holdout F1 tends to be substantially lower than cross-validated F1, and we want to understand why so that we can make a better model for the holdout set.
Things we might try:
Once we've done some internal experimentation, we will take up Lisa and Gert's offer to let us test a simple model on the holdout set.
We are going to focus on the monkeys paper for now. We will return to these ideas later and decide whether/how much to pursue them depending how much time we have available.
The text was updated successfully, but these errors were encountered: