This is part of the hands-on project for the Master in Data & Decision Science course, by Sirius School of Technology - Module 1 - Statistical Analysis and Business Intelligence.
Startup Genome’s Global Startup Ecosystem Report (GSER) is powered by the world’s most comprehensive and quality controlled dataset on startup ecosystems. Informed by information on 3.5 million startups across 290 global ecosystems, our data and insights are the product of over a decade of independent research and policy work.
GSER 2023 ranks the top 30 and 10 runner-up global ecosystems, and includes a top 100 ranking of emerging ecosystems. It also takes a look at startup communities from a regional perspective, separately ranking ecosystems in Africa, Asia, Europe, Latin America, MENA, North America, and Oceania.
This repository shows the chosen approach to cleaning, organizing and structuring the data using Python, but it DOES NOT provide access to the data itself. The data used in this project is proprietary and can only be accessed by the Startup Genome team and by authorized partners.
- João Morossini (LinkedIn)
- Thiago Seronni Mendonça (LinkedIn)
- Leo Koki Shashiki (LinkedIn)
- Max Meneghini (LinkedIn)
- Rafael Costa (LinkedIn)
-
data/
: Folder to store data.raw/
: Folder to store raw data.processed/
: Folder to store processed data.external/
: Folder to store external data.
-
docs/
: Folder to store explanatory documents and additional information. -
notebooks/
: Folder to store Jupyter notebooks.exploratory/
: Folder for exploratory analysis notebooks.preprocessing/
: Folder for preprocessing notebooks.analysis/
: Folder for data analysis notebooks.
-
reports/
: Folder to store reports and results.figures/
: Folder to store generated figures.results/
: Folder to store project results.
-
src/
: Folder to store source code.data/
: Folder for data manipulation modules.models/
: Folder for machine learning model modules.utils/
: Folder for utility modules.scripts/
: Folder for auxiliary scripts.
-
tests/
: Folder to store automated tests.
pip install -r requirements.txt
- Place the datasets in "genome\data\raw"
- abstract.csv
- IPC Titles.xlsx
- ListOfCompanies.csv
- raw_patents.csv
- table_for_applicants.csv