Understanding gap between cross-validated F1 and holdout F1 #26

emilycantrell · 2024-06-12T18:34:34Z

Holdout F1 tends to be substantially lower than cross-validated F1, and we want to understand why so that we can make a better model for the holdout set.

Things we might try:

Split off an internal test set from the PreFer training data, and examine how F1 changes between CV score and test set score. Alternatively, look at consistency in F1 across CV folds.
Study the consistency of feature selection across folds. Should we restrict the number of features in order to have more consistently in F1 score from CV to holdout?
Should we restrict the model flexibility in order to have more consistently in F1 score from CV to holdout?

Once we've done some internal experimentation, we will take up Lisa and Gert's offer to let us test a simple model on the holdout set.

We are going to focus on the monkeys paper for now. We will return to these ideas later and decide whether/how much to pursue them depending how much time we have available.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Understanding gap between cross-validated F1 and holdout F1 #26

Understanding gap between cross-validated F1 and holdout F1 #26

emilycantrell commented Jun 12, 2024

Understanding gap between cross-validated F1 and holdout F1 #26

Understanding gap between cross-validated F1 and holdout F1 #26

Comments

emilycantrell commented Jun 12, 2024