Pokémon Data Warehouse & Analysis (MongoDB Project)

Master’s Program – NoSQL / MongoDB Module

Author: Carlos Leon Environment: Python 3.12 · Jupyter Notebook · MongoDB (local instance)

Overview

This project demonstrates a complete data pipeline and analytical workflow using MongoDB and Python.
It sources live data from the PokéAPI, stores it in MongoDB, performs modeling, transformation, and aggregation, and generates both analytical results and visualizations.

The goal is to build a Pokémon Data Warehouse, similar to a biological or ecological study database — enabling exploration of Pokémon species characteristics, habitats, and relationships.

The notebook for this project can be seen in the pokeapi_to_mongodb.py here, and as a pdf version here

Project Workflow

1. Data Collection

Connected to PokéAPI endpoints:
- /pokemon-species/ – species-level biological details (habitat, color, growth rate, etc.)
- /pokemon/ – individual Pokémon attributes (stats, abilities, height, weight, types)
- /pokemon-habitat/ – contextual grouping of species
Data fetched via Python using requests and stored raw in MongoDB for cleaning and transformation.

2. Data Modeling

Three main MongoDB collections were created:

Collection	Description
`species_clean`	Cleaned and standardized Pokémon species data
`pokemon_clean`	Cleaned Pokémon-specific data (stats, types, abilities)
`pokedex_entries`	Simplified dataset emulating a real Pokédex (name, type, height, flavor text, sprite)

These are linked logically via common fields like name and dex_number.

3. Data Cleaning & Transformation

Extracted relevant fields from nested JSON objects.
Removed duplicates and missing records.
Created derived attributes (height_m, weight_kg, etc.).
Enforced consistent naming conventions and standardized field structures.

4. Data Visualization

Key visuals included:

Pokémon species by habitat and status (regular, legendary, mythical)
Dual-type co-occurrence analysis
Average base stats by primary type
Evolution chain sizes
Egg group × habitat heatmap
Interactive Pokédex entry card visual for a selected Pokémon

Each visualization was generated directly from MongoDB aggregation results using matplotlib and pandas.

5. Aggregation Pipelines

Average Weight (kg) by Habitat Distribution
Habitat distribution by legendary/mythical status
Type co-occurrence network
Average base stats per type
Evolution chain sizes
Egg-group × habitat frequency analysis

Each pipeline demonstrates different aggregation operators (e.g., $match, $group, $project, $unwind, $sort, $cond, $arrayToObject).

6. Pokedex Simulation

A lightweight “Pokédex” collection was built to visualize Pokémon entries with:

Species name and genus
Type and physical data
Flavor text
Sprite image and Artwork

An inline visual cell mimics an actual Pokédex entry.

7. Schema Diagram

A class diagram was created using Mermaid, representing:

Collections (species_clean, pokemon_clean, pokedex_entries)
Their logical relationships (name / species_id links)

Technologies Used

Category	Tools
Programming	Python 3.12
Database	MongoDB (local instance) + Compass
Visualization	Matplotlib
Development	VS Code + Jupyter Notebook
API	PokéAPI
Class Diagram	Via mermaidchart.com

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
img		img
.env.example		.env.example
.gitignore		.gitignore
README.md		README.md
pokeapi_report.html		pokeapi_report.html
pokeapi_to_mongodb.html		pokeapi_to_mongodb.html
pokeapi_to_mongodb.ipynb		pokeapi_to_mongodb.ipynb
pokeapi_to_mongodb.pdf		pokeapi_to_mongodb.pdf
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Pokémon Data Warehouse & Analysis (MongoDB Project)

Master’s Program – NoSQL / MongoDB Module

Overview

Project Workflow

1. Data Collection

2. Data Modeling

3. Data Cleaning & Transformation

4. Data Visualization

5. Aggregation Pipelines

6. Pokedex Simulation

7. Schema Diagram

Technologies Used

About

Uh oh!

Releases

Packages

Languages

cgleonr/mongodb-project

Folders and files

Latest commit

History

Repository files navigation

Pokémon Data Warehouse & Analysis (MongoDB Project)

Master’s Program – NoSQL / MongoDB Module

Overview

Project Workflow

1. Data Collection

2. Data Modeling

3. Data Cleaning & Transformation

4. Data Visualization

5. Aggregation Pipelines

6. Pokedex Simulation

7. Schema Diagram

Technologies Used

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages