My First Try At 2019 Data Science Bowl

I'm Syed Hamza, 16 Year old High School student. I'm an enthusiastic python programmer also interested in Machine Learning and Deep Learning techniques. This is my first attempt to Kaggle compettition. Unfortunately, I joined the competition when it was nearing the deadline.

I loaded around 60 kernels to incrementally improve the score using XGB, LGB and CATBOOST, but I missed trying important tests using ensembles. My pre-final public score was 0.553 but after the final day the score turned to 0.527 with private leaderboard ranking around 1200 among 3500 teams.

Here, I'm presenting my experience from few important kernels, incase it helps someone later time.

This repository consists of the codes of my first attept at a kaggle competition at
https://www.kaggle.com/c/data-science-bowl-2019
Thanks to:
shippeng and
artgor for their help.

The model with my highest public score but a comparitively less private score is here most likely due to overfitting of the XGB model(as the max depth was taken as 10) and the parameters were not optimised. This proved that a higher score on the public leaderboard does'nt necessary give a higher score in the private leaderboard.
Hence it is better to submit a kernel with a lower score as well.

Highest Scores

My highest private score using lightboost and xgboost is here.
Code with similar private score was using lightboost, xgboost, neural network and convolutional neural network is here.

Weights

Code for finding the best weights for ensembling is here using the predictions taken from the lgb and xgb model.Since the data used by kaggle for submission can be fairly varied, we cannot do this process using the training data itself as it will guarantee a overfitted model.

Things I Missed

Ensembling using:
1)Bagging
2)Boosting
Personal analysis of data.

My Attempt

This was my first attempt at a kaggle competition. Since I joined the competition late, I had no time for analyzing the data thoroughly and hence had to rely on the above kernels for the processed data. All the given models have optimised hyperparameters using byesian optimisation(for LightBoost and Catboost) or hyperopt(for Xgboost).

Thanks for reading! Hope this repository is helpful.

Name		Name	Last commit message	Last commit date
Latest commit History 49 Commits
Highest_Score_0.ipynb		Highest_Score_0.ipynb
Highest_Score_1.ipynb		Highest_Score_1.ipynb
Overfit.ipynb		Overfit.ipynb
README.md		README.md
weights_calculation.ipynb		weights_calculation.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

My First Try At 2019 Data Science Bowl

Highest Scores

Weights

Things I Missed

My Attempt

About

Releases

Packages

Languages

syed-hamza/simple-code-for-2019-Data-Science-Bowl

Folders and files

Latest commit

History

Repository files navigation

My First Try At 2019 Data Science Bowl

Highest Scores

Weights

Things I Missed

My Attempt

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages