Data-Centric-AI-Community
diff --git a/‎.gitignore
Lines changed: 2 additions & 0 deletions b/‎.gitignore
Lines changed: 2 additions & 0 deletions
diff --git a/‎README.md
Lines changed: 19 additions & 2 deletions b/‎README.md
Lines changed: 19 additions & 2 deletions
diff --git a/‎eda-projects/olympics/Olympics_124_years_Dataset.ipynb
Lines changed: 180 additions & 0 deletions b/‎eda-projects/olympics/Olympics_124_years_Dataset.ipynb
Lines changed: 180 additions & 0 deletions
@@ -0,0 +1,2 @@
+
+*.DS_Store
@@ -48,7 +48,17 @@ Please refer to [this folder](https://github.com/Data-Centric-AI-Community/aweso
 ## 📊 Python for Data Science
 
 ### ❓ Where to Start!
-🚧 WIP
+To learn data science, the CRISP-DM is a good approach:
+
+[CRISP-DM methodology](https://www.datasciencecentral.com/userful-r-packages-that-aligns-with-the-crisp-dm-methodology/)
+
+1. Business/Problem Understanding
+2. 🆕 **Data Understanding:** Start with the EDA Projects in the **Exercises** section below! 🎉
+3. Data Preparation
+4. Modelling
+5. Evaluation
+6. Deployment
+
 
 ### 📚 Awesome Books
 🚧 WIP
@@ -60,7 +70,14 @@ Please refer to [this folder](https://github.com/Data-Centric-AI-Community/aweso
 
 
 ### 🏋🏽‍♀️ Exercises
-🚧 WIP
+
+#### 🕵🏻 Exploratory Data Analysis
+1. [Olympic 124 Years Dataset](eda-projects/olympics): Exploring a dataset of the Olympic Games 
+
+##### 🫂 How to contribute? 
+- Download the project and try to solve it at your own pace!
+- Ask as many questions as you like in our [discord channel #🐍ds-projects](http://discord.com/invite/mw7xjJ7b7s)
+- Share your final project by creating a Pull Request! 👏
 
 
 ### 🔗 Tutorials and Resources
 
@@ -0,0 +1,180 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "id": "739ce4df",
+   "metadata": {},
+   "source": [
+    "# Exploratory Data Analysis\n",
+    "\n",
+    "In this notebook, we will analyse the [Olympics 124 years Dataset](https://www.kaggle.com/datasets/nitishsharma01/olympics-124-years-datasettill-2020?resource=download).\n",
+    "\n",
+    "**This project is inspired by the work of @Vengeance, posted in the** [Data-Centric AI Community](https://tiny.ydata.ai/dcai-community-github).\n",
+    "\n",
+    "\n",
+    "Some of the questions we're looking to answer are:\n",
+    "\n",
+    "## Basic Analysis\n",
+    "- 1. How many events are there in each discipline?\n",
+    "- 2. What is the event count over the years?\n",
+    "- 3. How is the gender particiption in every olympic games?\n",
+    "\n",
+    "## Country Analysis\n",
+    "- 4. Find the total medals won by each country\n",
+    "- 5. What is the total participation from each country until the 2020 olympics?\n",
+    "- 6. How has the performance of a particular country changed over different Olympic editions?\n",
+    "\n",
+    "## Athlete Analysis\n",
+    "- 7. Who are the Top Athletes in each discipline from each country?\n",
+    "- 8. What is the age distribution of top gold medalists?\n",
+    "- 9. What is the age distribution of medalists by medal (gold, silver, bronze)?\n",
+    "- 10. How many teams are there from each country?\n",
+    "- 11. In which Discipline most medals are won as teams?\n",
+    "- 12. Which country is better in which team sport?"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 2,
+   "id": "1d2a70e0",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import pandas as pd"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 6,
+   "id": "41dde653",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from ydata_profiling import ProfileReport"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 7,
+   "id": "5ee98611",
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "application/vnd.jupyter.widget-view+json": {
+       "model_id": "36fc896704ac40e09954e0be1aa6ffb2",
+       "version_major": 2,
+       "version_minor": 0
+      },
+      "text/plain": [
+       "Summarize dataset:   0%|          | 0/5 [00:00<?, ?it/s]"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    },
+    {
+     "data": {
+      "application/vnd.jupyter.widget-view+json": {
+       "model_id": "4c5771e1a87c482281b652b44612eeaf",
+       "version_major": 2,
+       "version_minor": 0
+      },
+      "text/plain": [
+       "Generate report structure:   0%|          | 0/1 [00:00<?, ?it/s]"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    },
+    {
+     "data": {
+      "application/vnd.jupyter.widget-view+json": {
+       "model_id": "ed972f64e1f64e8ab578ec89d2209ff1",
+       "version_major": 2,
+       "version_minor": 0
+      },
+      "text/plain": [
+       "Render HTML:   0%|          | 0/1 [00:00<?, ?it/s]"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    },
+    {
+     "data": {
+      "application/vnd.jupyter.widget-view+json": {
+       "model_id": "a269eafb05c3465f9183ae8149ff8f84",
+       "version_major": 2,
+       "version_minor": 0
+      },
+      "text/plain": [
+       "Export report to file:   0%|          | 0/1 [00:00<?, ?it/s]"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    }
+   ],
+   "source": [
+    "#Read data and generate data profiling report\n",
+    "df = pd.read_csv(\"olympics_dataset/Athletes_summer_games.csv\")\n",
+    "olympics_report = ProfileReport(df, title='Olympics Dataset')\n",
+    "olympics_report.to_file(\"olympics_report.html\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "3eeee7ab",
+   "metadata": {},
+   "source": [
+    "## Basic Analysis"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "d66216e6",
+   "metadata": {},
+   "source": [
+    "## Country Analysis"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "ee1e831f",
+   "metadata": {},
+   "source": [
+    "## Athlete Analysis"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "c9274e56",
+   "metadata": {},
+   "outputs": [],
+   "source": []
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 3 (ipykernel)",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.10.12"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 5
+}