Add K-fold training strategy for XGBoost Ranker #12
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
During experiments, we noticed the strategy suggested in the XGBoost ranker notebook -- of separating a holdout evaluation set to use as training set for the ranker model -- did not provide us any reasonable results. To tackle this and allow us to train the ranker using the complete training set, we developed the scripts in this PR:
run_train_kfold.pytrains recommender models using K-fold cross-validation, and saves the model trained on each fold for later inference;run_ranker_dataset.pyuses the saved models to compute item features, as well as several other features derived from user interactions and item features.