Author: Carlos Leon Environment: Python 3.12 · Jupyter Notebook · MongoDB (local instance)
This project demonstrates a complete data pipeline and analytical workflow using MongoDB and Python.
It sources live data from the PokéAPI, stores it in MongoDB, performs modeling, transformation, and aggregation, and generates both analytical results and visualizations.
The goal is to build a Pokémon Data Warehouse, similar to a biological or ecological study database — enabling exploration of Pokémon species characteristics, habitats, and relationships.
The notebook for this project can be seen in the pokeapi_to_mongodb.py here, and as a pdf version here
- Connected to PokéAPI endpoints:
/pokemon-species/– species-level biological details (habitat, color, growth rate, etc.)/pokemon/– individual Pokémon attributes (stats, abilities, height, weight, types)/pokemon-habitat/– contextual grouping of species
- Data fetched via Python using
requestsand stored raw in MongoDB for cleaning and transformation.
Three main MongoDB collections were created:
| Collection | Description |
|---|---|
species_clean |
Cleaned and standardized Pokémon species data |
pokemon_clean |
Cleaned Pokémon-specific data (stats, types, abilities) |
pokedex_entries |
Simplified dataset emulating a real Pokédex (name, type, height, flavor text, sprite) |
These are linked logically via common fields like name and dex_number.
- Extracted relevant fields from nested JSON objects.
- Removed duplicates and missing records.
- Created derived attributes (
height_m,weight_kg, etc.). - Enforced consistent naming conventions and standardized field structures.
Key visuals included:
- Pokémon species by habitat and status (regular, legendary, mythical)
- Dual-type co-occurrence analysis
- Average base stats by primary type
- Evolution chain sizes
- Egg group × habitat heatmap
- Interactive Pokédex entry card visual for a selected Pokémon
Each visualization was generated directly from MongoDB aggregation results using matplotlib and pandas.
- Average Weight (kg) by Habitat Distribution
- Habitat distribution by legendary/mythical status
- Type co-occurrence network
- Average base stats per type
- Evolution chain sizes
- Egg-group × habitat frequency analysis
Each pipeline demonstrates different aggregation operators (e.g., $match, $group, $project, $unwind, $sort, $cond, $arrayToObject).
A lightweight “Pokédex” collection was built to visualize Pokémon entries with:
- Species name and genus
- Type and physical data
- Flavor text
- Sprite image and Artwork
An inline visual cell mimics an actual Pokédex entry.
A class diagram was created using Mermaid, representing:
- Collections (
species_clean,pokemon_clean,pokedex_entries) - Their logical relationships (
name/species_idlinks)
| Category | Tools |
|---|---|
| Programming | Python 3.12 |
| Database | MongoDB (local instance) + Compass |
| Visualization | Matplotlib |
| Development | VS Code + Jupyter Notebook |
| API | PokéAPI |
| Class Diagram | Via mermaidchart.com |