Jaumo ML task

I created the solution in jupyter notebook because I think it was appropriate for this kind of task.

Solution is split into four parts - data analysis, preparing data, training model and evaluation.

Data analysis

I usually do this phase to get some information about data, which allows me to do better feature engineering, clean data and find some possible problems in the data.

It was found out that all feature columns were generate from uniform distribution. Feature columns were independet to each other.

Preparing data

Converts the input matrix to train/dev datasets - ids, features and labels for each dataset.

Training model

I just picked of the shelf classifier - random forest. It wasn't neccessary to change model due to very good results.

Evaluation

I used f1 measure because it is better than accuracy in general for imbalanced datasets. I plotted confusion matrix as requested.

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
in		in
.gitignore		.gitignore
Answers.md		Answers.md
JaumoMLHomework.md		JaumoMLHomework.md
README.md		README.md
jaumo_ml_task.ipynb		jaumo_ml_task.ipynb
tputils.py		tputils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Jaumo ML task

Data analysis

Preparing data

Training model

Evaluation

About

Uh oh!

Releases

Packages

Languages

tomasprinda/jaumo_ml_task

Folders and files

Latest commit

History

Repository files navigation

Jaumo ML task

Data analysis

Preparing data

Training model

Evaluation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages