diff --git a/.gitignore b/.gitignore new file mode 100644 index 0000000..b32363c --- /dev/null +++ b/.gitignore @@ -0,0 +1,96 @@ +# macOS +.DS_Store +.AppleDouble +.LSOverride +Icon +._* +.DocumentRevisions-V100 +.fseventsd +.Spotlight-V100 +.TemporaryItems +.Trashes +.VolumeIcon.icns +.com.apple.timemachine.donotpresent +.AppleDB +.AppleDesktop +Network Trash Folder +Temporary Items +.apdisk + +# Windows +Thumbs.db +ehthumbs.db +Desktop.ini +$RECYCLE.BIN/ +*.cab +*.msi +*.msm +*.msp +*.lnk + +# Linux +*~ +.directory +.Trash-* + +# IDEs and Editors +.vscode/ +.idea/ +*.swp +*.swo +*~ +.project +.classpath +.c9/ +*.launch +.settings/ +*.sublime-workspace +*.sublime-project + +# Logs +*.log +logs/ +npm-debug.log* +yarn-debug.log* +yarn-error.log* + +# Dependencies +node_modules/ +bower_components/ +vendor/ + +# Build outputs +dist/ +build/ +out/ +target/ +*.o +*.pyc +*.pyo +__pycache__/ +*.class + +# Environment variables +.env +.env.local +.env.*.local + +# Temporary files +*.tmp +*.temp +*.cache +.sass-cache/ + +# Archives +*.zip +*.tar.gz +*.rar +*.7z + +# Sensitive or large files +*.pem +*.key +*.p12 +*.pfx +*.cer +*.der diff --git a/Logo/SSS_Logo.png b/Logo/SSS_Logo.png index cf56066..408707c 100644 Binary files a/Logo/SSS_Logo.png and b/Logo/SSS_Logo.png differ diff --git a/README.md b/README.md index bcc7fd7..cc0a745 100644 --- a/README.md +++ b/README.md @@ -2,8 +2,174 @@ ![Sheridan Sigma Squad](Logo/SSS_Logo.png) -## Team members +# Team members - Ernani Fantinatti - Broinson Jeyarajah - Youstina Botros - Nguyen Anh Khoa Tran + +# Education as an Engine for Growth: Analyzing SDG #4 + +### Sheridan Datathon 2025 Team Project + +![Project Status](https://img.shields.io/badge/Status-Active-green) +![Python](https://img.shields.io/badge/Python-3.9%2B-blue) +![License](https://img.shields.io/badge/License-MIT-yellow) + +## Overview + +This repository contains the work of our team for the **Sheridan Datathon 2025**. Our project focuses on **United Nations Sustainable Development Goal (SDG) #4: Quality Education**. + +Using global educational and economic data, we aim to quantify the relationship between educational attainment (literacy rates, school enrollment, government expenditure) and economic stability (GDP, growth rates). This project demonstrates our ability to deliver business value through data analysis, statistical modeling, and team collaboration. + +## The Team + +| Name | Role | GitHub Profile | +|------|------|----------------| +| [Member 1 Name] | [e.g., Data Engineer/Lead] | [@username](https://github.com/username) | +| [Member 2 Name] | [e.g., ML Engineer/Analyst] | [@username](https://github.com/username) | +| [Member 3 Name] | [e.g., Visualization Specialist] | [@username](https://github.com/username) | +| [Member 4 Name] | [e.g., Project Manager/Analyst] | [@username](https://github.com/username) | + +----- + +## Business Case & Objective + +**The Problem:** +While it is widely accepted that education is good for society, policymakers and NGOs (Non-Governmental Organizations) often struggle to determine *which specific* educational investments yield the highest economic returns. Without clear data, funding may be misallocated. + +**Our Objective:** +To analyze historical data to determine how specific "Quality Education" metrics (such as tertiary enrollment rates and literacy) correlate with and predict a nation's Economic Growth. + +**Guiding Questions:** + + * **Audience:** Government Education Ministries, The World Bank, and Global NGOs. + * **Key Question:** *To what extent does investment in quality education predict a nation's GDP growth over time?* + * **Business Value:** Providing data-driven insights to help stakeholders prioritize educational funding for maximum economic impact. + +----- + +## The Dataset + +We are utilizing the **"How Education Drives Economic Growth"** dataset sourced from Kaggle. + + * **Source:** [Kaggle Link](https://www.kaggle.com/datasets/omarmohammed70/how-education-drives-economic-growth) + * **Description:** The dataset includes key indicators for various countries over a time series. + * **Key Features:** + * `GDP`: Gross Domestic Product. + * `Literacy Rate`: Percentage of the population that can read/write. + * `School Enrollment`: Enrollment figures for primary, secondary, and tertiary levels. + * `Government Expenditure`: Spending on education as a % of GDP. + +> *Note: See the `data/raw` folder for the original dataset and `data/processed` for the cleaned version used for modeling.* + +----- + +## Repository Structure + +We have organized our repository to ensure reproducibility and clean workflow management: + +```text +├── data +│ ├── processed # Cleaned data used for analysis +│ ├── raw # Original immutable data dump +│ └── sql # SQL exports of data tables +├── experiments # Notebooks for testing ideas (sandbox) +├── models # Serialized models (pkl/joblib) and predictions +├── reports # Generated analysis reports (HTML/PDF) +│ └── figures # Saved PNGs of plots +├── src # Source code for use in this project +│ ├── data_cleaning.py +│ ├── visualization.py +│ └── modeling.py +├── .gitignore # Files to exclude (e.g., large data) +└── README.md # Project documentation + +----- + +## Methodology + +### 1\. Data Processing + + * **Cleaning:** Handled missing values in GDP and Literacy columns using [Method, e.g., forward-fill or mean imputation]. + * **Normalization:** Scaled economic features to account for vast differences in currency magnitude. + * **Feature Engineering:** Created a "Quality Index" combining literacy and enrollment rates. + +### 2\. Exploratory Data Analysis (EDA) + +We explored relationships between variables using correlation matrices and scatter plots. + + * *Key Finding:* [e.g., We found a 0.78 correlation between Tertiary Enrollment and GDP per Capita]. + +### 3\. Modeling / Visualization + + * **Tools Used:** Python, Pandas, Scikit-Learn, Matplotlib/Seaborn. + * **Approach:** [Describe if you used Linear Regression to predict GDP or Clustering to group countries by education level]. + +----- + +## Key Insights & Deliverables + +### Visualization Highlight + +*Description: This chart demonstrates the positive trend between [Variable A] and [Variable B], indicating that...* + +### Business Takeaways + +1. **Insight 1:** [e.g., Investment in primary education shows a delayed economic ROI of 5 years.] +2. **Insight 2:** [e.g., Literacy rates above 90% are a prerequisite for reaching "High Income" status.] +3. **Recommendation:** Stakeholders should prioritize [X] to maximize [Y]. + +----- + +## Project Video + +Each team member has reflected on the journey. Watch our project overview and individual reflections here: + + * **Project Showcase Video:** [Link to YouTube/Vimeo] + * **Team Reflections:** + \* + \* + \* + \* + +----- + +## Reproducibility: Getting Started + +To reproduce our findings, follow these steps: + +1. **Clone the repository:** + + ```bash + git clone [https://github.com/YourUsername/Sheridan-Datathon-2025-Education.git](https://github.com/YourUsername/Sheridan-Datathon-2025-Education.git) + cd Sheridan-Datathon-2025-Education + ``` + +2. **Install dependencies:** + + ```bash + pip install -r requirements.txt + ``` + +3. **Run the analysis:** + + * To clean data: `python src/data_cleaning.py` + * To generate plots: `python src/visualization.py` + * To run models: `python src/modeling.py` + +----- + +## Collaboration & Workflow + + * **Git Strategy:** We utilized feature branches for every new task and mandated Pull Requests (PRs) with at least one peer review before merging into `main`. + * **Communication:** Daily stand-ups via Slack to discuss blockers and progress. + * **Task Management:** Used GitHub Projects to track "To Do," "In Progress," and "Done." + +## Contact + +For questions regarding this analysis, please reach out to the team via our GitHub profiles listed above. + +----- + +*For the Sheridan Datathon 2025.* \ No newline at end of file