Skip to content

Commit 777f2db

Browse files
Add EDA project
Add Olympics EDA project (data and starter notebook).
1 parent ba43ab6 commit 777f2db

File tree

6 files changed

+286675
-2
lines changed

6 files changed

+286675
-2
lines changed

.gitignore

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,2 @@
1+
2+
*.DS_Store

README.md

Lines changed: 19 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -48,7 +48,17 @@ Please refer to [this folder](https://github.com/Data-Centric-AI-Community/aweso
4848
## 📊 Python for Data Science
4949

5050
### ❓ Where to Start!
51-
🚧 WIP
51+
To learn data science, the CRISP-DM is a good approach:
52+
53+
[CRISP-DM methodology](https://www.datasciencecentral.com/userful-r-packages-that-aligns-with-the-crisp-dm-methodology/)
54+
55+
1. Business/Problem Understanding
56+
2. 🆕 **Data Understanding:** Start with the EDA Projects in the **Exercises** section below! 🎉
57+
3. Data Preparation
58+
4. Modelling
59+
5. Evaluation
60+
6. Deployment
61+
5262

5363
### 📚 Awesome Books
5464
🚧 WIP
@@ -60,7 +70,14 @@ Please refer to [this folder](https://github.com/Data-Centric-AI-Community/aweso
6070

6171

6272
### 🏋🏽‍♀️ Exercises
63-
🚧 WIP
73+
74+
#### 🕵🏻 Exploratory Data Analysis
75+
1. [Olympic 124 Years Dataset](eda-projects/olympics): Exploring a dataset of the Olympic Games
76+
77+
##### 🫂 How to contribute?
78+
- Download the project and try to solve it at your own pace!
79+
- Ask as many questions as you like in our [discord channel #🐍ds-projects](http://discord.com/invite/mw7xjJ7b7s)
80+
- Share your final project by creating a Pull Request! 👏
6481

6582

6683
### 🔗 Tutorials and Resources
Lines changed: 180 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,180 @@
1+
{
2+
"cells": [
3+
{
4+
"cell_type": "markdown",
5+
"id": "739ce4df",
6+
"metadata": {},
7+
"source": [
8+
"# Exploratory Data Analysis\n",
9+
"\n",
10+
"In this notebook, we will analyse the [Olympics 124 years Dataset](https://www.kaggle.com/datasets/nitishsharma01/olympics-124-years-datasettill-2020?resource=download).\n",
11+
"\n",
12+
"**This project is inspired by the work of @Vengeance, posted in the** [Data-Centric AI Community](https://tiny.ydata.ai/dcai-community-github).\n",
13+
"\n",
14+
"\n",
15+
"Some of the questions we're looking to answer are:\n",
16+
"\n",
17+
"## Basic Analysis\n",
18+
"- 1. How many events are there in each discipline?\n",
19+
"- 2. What is the event count over the years?\n",
20+
"- 3. How is the gender particiption in every olympic games?\n",
21+
"\n",
22+
"## Country Analysis\n",
23+
"- 4. Find the total medals won by each country\n",
24+
"- 5. What is the total participation from each country until the 2020 olympics?\n",
25+
"- 6. How has the performance of a particular country changed over different Olympic editions?\n",
26+
"\n",
27+
"## Athlete Analysis\n",
28+
"- 7. Who are the Top Athletes in each discipline from each country?\n",
29+
"- 8. What is the age distribution of top gold medalists?\n",
30+
"- 9. What is the age distribution of medalists by medal (gold, silver, bronze)?\n",
31+
"- 10. How many teams are there from each country?\n",
32+
"- 11. In which Discipline most medals are won as teams?\n",
33+
"- 12. Which country is better in which team sport?"
34+
]
35+
},
36+
{
37+
"cell_type": "code",
38+
"execution_count": 2,
39+
"id": "1d2a70e0",
40+
"metadata": {},
41+
"outputs": [],
42+
"source": [
43+
"import pandas as pd"
44+
]
45+
},
46+
{
47+
"cell_type": "code",
48+
"execution_count": 6,
49+
"id": "41dde653",
50+
"metadata": {},
51+
"outputs": [],
52+
"source": [
53+
"from ydata_profiling import ProfileReport"
54+
]
55+
},
56+
{
57+
"cell_type": "code",
58+
"execution_count": 7,
59+
"id": "5ee98611",
60+
"metadata": {},
61+
"outputs": [
62+
{
63+
"data": {
64+
"application/vnd.jupyter.widget-view+json": {
65+
"model_id": "36fc896704ac40e09954e0be1aa6ffb2",
66+
"version_major": 2,
67+
"version_minor": 0
68+
},
69+
"text/plain": [
70+
"Summarize dataset: 0%| | 0/5 [00:00<?, ?it/s]"
71+
]
72+
},
73+
"metadata": {},
74+
"output_type": "display_data"
75+
},
76+
{
77+
"data": {
78+
"application/vnd.jupyter.widget-view+json": {
79+
"model_id": "4c5771e1a87c482281b652b44612eeaf",
80+
"version_major": 2,
81+
"version_minor": 0
82+
},
83+
"text/plain": [
84+
"Generate report structure: 0%| | 0/1 [00:00<?, ?it/s]"
85+
]
86+
},
87+
"metadata": {},
88+
"output_type": "display_data"
89+
},
90+
{
91+
"data": {
92+
"application/vnd.jupyter.widget-view+json": {
93+
"model_id": "ed972f64e1f64e8ab578ec89d2209ff1",
94+
"version_major": 2,
95+
"version_minor": 0
96+
},
97+
"text/plain": [
98+
"Render HTML: 0%| | 0/1 [00:00<?, ?it/s]"
99+
]
100+
},
101+
"metadata": {},
102+
"output_type": "display_data"
103+
},
104+
{
105+
"data": {
106+
"application/vnd.jupyter.widget-view+json": {
107+
"model_id": "a269eafb05c3465f9183ae8149ff8f84",
108+
"version_major": 2,
109+
"version_minor": 0
110+
},
111+
"text/plain": [
112+
"Export report to file: 0%| | 0/1 [00:00<?, ?it/s]"
113+
]
114+
},
115+
"metadata": {},
116+
"output_type": "display_data"
117+
}
118+
],
119+
"source": [
120+
"#Read data and generate data profiling report\n",
121+
"df = pd.read_csv(\"olympics_dataset/Athletes_summer_games.csv\")\n",
122+
"olympics_report = ProfileReport(df, title='Olympics Dataset')\n",
123+
"olympics_report.to_file(\"olympics_report.html\")"
124+
]
125+
},
126+
{
127+
"cell_type": "markdown",
128+
"id": "3eeee7ab",
129+
"metadata": {},
130+
"source": [
131+
"## Basic Analysis"
132+
]
133+
},
134+
{
135+
"cell_type": "markdown",
136+
"id": "d66216e6",
137+
"metadata": {},
138+
"source": [
139+
"## Country Analysis"
140+
]
141+
},
142+
{
143+
"cell_type": "markdown",
144+
"id": "ee1e831f",
145+
"metadata": {},
146+
"source": [
147+
"## Athlete Analysis"
148+
]
149+
},
150+
{
151+
"cell_type": "code",
152+
"execution_count": null,
153+
"id": "c9274e56",
154+
"metadata": {},
155+
"outputs": [],
156+
"source": []
157+
}
158+
],
159+
"metadata": {
160+
"kernelspec": {
161+
"display_name": "Python 3 (ipykernel)",
162+
"language": "python",
163+
"name": "python3"
164+
},
165+
"language_info": {
166+
"codemirror_mode": {
167+
"name": "ipython",
168+
"version": 3
169+
},
170+
"file_extension": ".py",
171+
"mimetype": "text/x-python",
172+
"name": "python",
173+
"nbconvert_exporter": "python",
174+
"pygments_lexer": "ipython3",
175+
"version": "3.10.12"
176+
}
177+
},
178+
"nbformat": 4,
179+
"nbformat_minor": 5
180+
}

0 commit comments

Comments
 (0)