CORE PROJECT ONE

A Data Science project to explore demographics of Barcelona city.

Project Task List

L1: Crear api en fastapi
L1: Crear dashboard en streamlit
L1: Base de datos en MongoDB o PostgreSQL
L2: Utilizar de datos geoespaciales y geoqueries en MongoDB o Postgres (Usando PostGIS)*
L2: Tener la base de datos en el Cloud (Hay servicios gratis en MongoDB Atlas, Heroku Postgres, dentre otros)
L2: Generar reporte pdf de los datos visibles en Streamlit, descargable mediante boton.
L2: Un dashboard de multiples páginas en Streamlit
L3: Que el dashboard te envie el reporte pdf por e-mail
L3: Poder subir nuevos datos a la bbdd via la API (usuario y contraseña como headers del request)
L4: Poder actualizar la base de datos via Streamlit (con usuario y contraseña, en una página a parte. El dashboard debe hacer la petición anterior que añade datos via API)
L4: Crear contenedor Docker y hacer deploy de los servicios en el cloud (Heroku. Los dos servicios deben subirse separadamente)
L5: Controlar el pipeline con Apache Airflow

Data Source

This project is based on a dataset of Barcelona city that contains information about demographics and population statistics.

For this project, we will using the accidents dataset, which contains useful information about dates, places, and accidents.

Data Analysis with Jupyter Lab

To make the Exploratory Data Analysis, I've used Jupyter Lab

During the analysis, I've used the following libraries:

For this stage of the project, the main concern has been to understand the data in order to be able to make interesting questions (and visualizations then) about it.

Furthermore, I've been extremely interested in reducing the dataset weight, which I've achieved passing from a 6MB file to a 240KB file. This has been achieved reducing the number of unnecesary columns, cleaning the data and assigning the correct data type to it.

Another interesting aspect is that requiring location information with the geopy API is very limited, so I've investigated about Python multithreading capabilities and implemented parallel requests using high performance methods.

You can run a notebook in the data folder and repeat my steps, and when you're done, export the DataFrame to a JSON file that will feed the local MongoDB database, or just upload it with your own URL in an .env file.

To be get the basic environment ready, install the packages contained in the requirements.txt file.

Fast API

This projects uses Fast API as its backend to create the API.

In this project I've levereaged the following features:

Asynchronous API
Asynchronous testing of the main route
Asynchronous MongoDB queries using motor
Routing with regex
OpenAPI documentation
Sentry integration
Python type hinting with pydantic
Type annotations with typing
Pydantic validation through models
Docker deployment using gunicorn with ASGI asynchronous workers

This project's API is deployed in a Docker container and to be able to run it locally, you must have an .env file with the following variables:

DATABASE_URL
DATABASE_NAME
DEBUG
ENVIRONMENT
SENTRY_DSN

Data Visualization with Streamlit

To be able to query Fast API I've used Streamlit as a frontend, wrapped with the Hydralit library to get some nice forntend features.

Within the frontend you will be able to make a couple of queries to the API, and you will be able to see the results in a nice dashboard.

The dashboard also implements a few other features, such as a search bar, a map, a table, and a graph.

There's a user panel in which you will be able to interact with the database, and you can also see the documentation of the API.

A user can also upload a new dataset to the database, and the dashboard will be updated with the new data. This is done through the API.

Another cool feature is that you can download a report of the data, which I've achieved using Beautiful Soup.

This project uses Streamlit's native secrets management, so you must provide an app/dashboard/.streamlit/secrets.toml file with the following variables:

url
api_key
api_secret
sender_email
sender_name

[mongo]
host
port
username
password

MongoDB

When you export the DataFrame with Jupyter, the data is stored in a folder that contains a MongoDB Dockerfile which will be used to create a MongoDB container automatically.

Authentication will be enabled using the entrypoint.sh script, which will be executed when the container is started. It also creates a non root user giving him readWrite role on a non-admin database.

In order to deploy the MongoDB Docker container, a database/.env file must be created with the following variables:

MONGO_INITDB_ROOT_USERNAME
MONGO_INITDB_ROOT_PASSWORD
MONGO_NON_ROOT_USERNAME
MONGO_NON_ROOT_PASSWORD
MONGO_NON_ROOT_ROLE
MONGO_INITDB_DATABASE
MONGO_NON_ROOT_DB

Running this project

This projects uses Makefile. The following options are available:

help: Show the help
build-docker-api: Build the Docker image for the API
lint-docker-api: Lint the Docker image for the API
run-docker-api: Run the Docker image for the API
build-docker-db: Build the Docker image for the MongoDB
lint-docker-db: Lint the Docker image for the MongoDB
run-docker-db: Run the Docker image for the MongoDB
build-docker-streamlit: Build the Docker image for the Streamlit
lint-docker-streamlit: Lint the Docker image for the Streamlit
run-docker-streamlit: Run the Docker image for the Streamlit
run-app: Run the application using docker compose
rm-app: Remove the docker-compose stack

Pre-commit

This project uses pre-commit to test the repository files before making a commit.

You need to install it with pip install pre-commit within the repository

Sentry

This project uses Sentry to send errors to the developers. It is configured with the SENTRY_DSN environment variable in Fast API

Contributing

Pull Requests are welcome! Feel free to contribute to the project.

Semantic release

This project uses Semantic Release and every push to the main branch will trigger a workflow that generates a CHANGELOG.md file.

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
.github		.github
app		app
data		data
database		database
.editorconfig		.editorconfig
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
.releaserc		.releaserc
CHANGELOG.md		CHANGELOG.md
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
docker-compose.yml		docker-compose.yml
requirements.txt		requirements.txt
setup.cfg		setup.cfg

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CORE PROJECT ONE

Table of contents

Project Task List

Data Source

Data Analysis with Jupyter Lab

Fast API

Data Visualization with Streamlit

MongoDB

Running this project

Pre-commit

Sentry

Contributing

Semantic release

Resources

About

Releases

Packages

Contributors 2

Languages

License

aacecandev/core-project-one

Folders and files

Latest commit

History

Repository files navigation

CORE PROJECT ONE

Table of contents

Project Task List

Data Source

Data Analysis with Jupyter Lab

Fast API

Data Visualization with Streamlit

MongoDB

Running this project

Pre-commit

Sentry

Contributing

Semantic release

Resources

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages