All unique final submissions from the top 400 teams can be downloaded by running the download_submissions.sh script. These are needed for running the preprocess.py script. The output of this script (which is needed for running the analysis and plot scripts) can be downloaded by running the download_preprocessed_data.sh script.
kaggle_competitions.txt contain data related to how the number of participants and prize pool of a competition is related.
public_kernels.csv contain data about how public kernel scores changed over time.
scores contain scores from teams at various values of k (see paper)
ensemble_scores contain scores from ensembles trained at various values of k.
n_contrib contain the number of models with a score greater than 0.01 in the ensembles trained at various values of k.