A fullstack platform to scrape app reviews from Google Play and Apple App Store, perform sentiment analysis using fine-tuned BERT models, compare applications, track model performance, and monitor infrastructure β scalable and production-ready.
allows users to search for apps, analyze sentiment (positive, negative, neutral), compare reviews between platforms, and share results via unique links.
- Search Apps: Enter an app name and select Google Play Store or Apple App Store.
- Sentiment Analysis: Reviews are analyzed and classified as positive, negative, or neutral.
- Comparison: Compare reviews between platforms and share analysis results via unique links.
- Notifications: Stay updated when reviews are fetched or new data is available.
- Multithreaded scrapers ensure backend performance is non-blocking.
- Use rotating proxies to prevent IP bans.
- The platform uses BERT for sentiment classification, which can be fine-tuned further.
- Optimized schema with indexes for app IDs and deduplication.
- Scales efficiently with high query volume.
βββ backend/
β βββ api/
β β βββ __init__.py
β β βββ lifespan.py
β β βββ routes/
β β β βββ scraper.py
β β β βββ sentiment.py
β β β βββ comparison.py
β β βββ utils/
β β βββ notification.py
β β βββ threading.py
β βββ models/
β β βββ sentiment_bert/ # Fine-tuned BERT models
β βββ schemas/
β β βββ app_schema.py
β β βββ review_schema.py
β βββ services/
β β βββ base_service.py
β β βββ review_service.py
β β βββ google_review_scraper.py
β β βββ apple_review_scraper.py
β βββ tests/
β β βββ services/
β β β βββ test_review_service.py
β βββ database/
β β βββ connection.py # Connect to external PostgreSQL
β β βββ queries.py
β β βββ migration_runner.py # Manual class-based migration runner
β β βββ migrations/
β β βββ __init__.py
β β βββ base_migration.py
β β βββ 001_create_apps_table.py
β β βββ 002_create_reviews_table.py
β β βββ 003_create_search_history_table.py
β βββ airflow/
β β βββ dags/
β β β βββ scrape_reviews_dag.py
β β β βββ retrain_model_dag.py
β β βββ plugins/
β β β βββ telegram_alert.py
β βββ mlflow_server/
β β βββ config/
β β βββ mlflow.cfg
β βββ monitoring/
β β βββ grafana/
β β β βββ grafana.ini
β β β βββ dashboards/
β β β βββ system_metrics.json
β β β βββ scraping_metrics.json
β β βββ prometheus/
β β β βββ prometheus.yml
β βββ docker/
β β βββ Dockerfile.backend
β β βββ Dockerfile.airflow
β β βββ Dockerfile.mlflow
β β βββ Dockerfile.prometheus
β β βββ Dockerfile.grafana
β βββ cron.py
β βββ main.py # FastAPI application entry
βββ frontend/
β βββ components/
β β βββ SearchBar.jsx
β β βββ AppComparison.jsx
β βββ pages/
β β βββ index.js
β β βββ compare/[id].js
β βββ redux/
β β βββ store.js
β β βββ slices/
β β β βββ appSlice.js
β βββ utils/
β β βββ api.js
β βββ Dockerfile.frontend
βββ datasets/
β βββ raw/
β β βββ initial_reviews_google.csv
β β βββ initial_reviews_apple.csv
β βββ processed/
β β βββ labeled_reviews.csv
β βββ README.md
βββ notebooks/
β βββ sentiment_experiment.ipynb
β βββ scraper_experiment.ipynb
βββ infrastructure/
β βββ airflow/
β β βββ airflow.cfg
β βββ grafana/
β β βββ grafana.ini
β β βββ dashboards/
β β βββ system_metrics.json
β β βββ scraping_metrics.json
β βββ mlflow/
β β βββ mlflow.cfg
β βββ deployment/
β β βββ docker-compose.prod.yml # Production Swarm Compose
β β βββ traefik.yml # Traefik reverse proxy config
βββ .env.development # Development environment variables
βββ .env.production # Production environment variables
βββ docker-compose.yml # Local dev docker-compose
βββ Makefile # Automation commands
βββ README.md # Project Documentation
- `backend/`: FastAPI app, scrapers, BERT fine-tuning, business logic
- `frontend/`: Next.js (with Redux Toolkit) web frontend
- `datasets/`: Initial datasets (raw, processed)
- `notebooks/`: Experimental Jupyter notebooks
- `infrastructure/`: Configurations and deployments (Airflow, Grafana, MLflow, Traefik)
- `.env.*`: Environment-specific variables
- `docker-compose.yml`: Development stack definition
- `README.md`: Documentation
| Layer | Technology |
|---|---|
| Frontend | Next.js, Redux Toolkit |
| Backend | FastAPI, Starlette, Pydantic |
| Scraping | Airflow Orchestration |
| ML Models | BERT fine-tuning, HuggingFace, MLflow |
| Database | External PostgreSQL (cloud-managed or separate server) |
| Monitoring | Prometheus + Grafana |
| Infrastructure | Docker Swarm, Traefik, Spot VMs (GCP) |
| Notifications | Email Alerts + Telegram API |
The system uses different .env files for development and production:
| Environment | File | Purpose |
|---|---|---|
| Development | .env.development |
Connects to local or test database |
| Production | .env.production |
Connects to cloud-hosted or external database |
Before running docker-compose or make commands, set the correct environment:
-
For development:
make set_env_dev make up
-
For production:
make set_env_prod docker stack deploy -c infrastructure/deployment/docker-compose.prod.yml your_stack_name
Note:
backendconnects to database viaDATABASE_URLenvironment variable only.
The Makefile automates all essential tasks:
| Command | Description |
|---|---|
make build |
Build all backend docker images |
make start_backend |
Start the FastAPI backend service |
make start_monitoring |
Start Grafana and Prometheus |
make start_airflow |
Start Airflow scheduler, webserver, worker |
make migrate |
Run manual class-based database migrations |
make up |
Bring up all services |
make down |
Bring down all services |
make logs |
View service logs |
make clean |
Clean up containers and volumes |
make restart |
Restart all services |
make build_and_up |
Build images and start services |
make set_env_dev |
Copy .env.development to .env |
make set_env_prod |
Copy .env.production to .env |
This project uses a Makefile to simplify common tasks.
| Command | Description |
|---|---|
make install |
Install Python dependencies into the virtual environment |
make migrate |
Run database migrations |
make backend-dev |
Start the FastAPI backend server (with Hot Reload) |
make docker-build |
Build Docker images |
make docker-up |
Start all Docker containers (backend, airflow, mlflow, etc.) |
make docker-down |
Stop all Docker containers |
make docker-restart |
Restart Docker containers cleanly |
make test |
Run backend unit tests |
| Command | Description |
|---|---|
make install |
Install Python dependencies using pip in the virtualenv |
make check-env |
Ensure .env file exists |
make check-venv |
Ensure Python virtualenv exists |
make load-env |
Export all variables from .env into the shell |
| Command | Description |
|---|---|
make up |
Smart startup: starts Postgres, waits, initializes Airflow if needed, then starts everything |
make backend-dev |
Run FastAPI dev server (uvicorn) |
make bootstrap-airflow |
Init Airflow DB if needed and restart services |
| Command | Description |
|---|---|
make docker-up |
Start all Docker containers |
make docker-down |
Stop all containers |
make docker-restart |
Restart all containers |
make docker-build |
Rebuild all Docker images |
make docker-status |
Show Docker container statuses |
make docker-logs |
Show container logs |
make docker-prune |
Remove unused Docker resources |
| Command | Description |
|---|---|
make migrate |
Run database migrations |
make backup-db |
Backup PostgreSQL to timestamped .sql |
| Command | Description |
|---|---|
make airflow-init |
Initialize Airflow DB |
make airflow-version |
Show Airflow version |
make health-check |
Ping services: FastAPI, Airflow, Postgres |
| Command | Description |
|---|---|
make kill-all |
# 1. Clone the repo
git clone https://github.com/your-username/sentiment-analysis-platform.git
cd sentiment-analysis-platform
# 2. Create a virtual environment manually if it doesn't exist
python3 -m venv env_sent
# 3. Activate your virtual environment
source env_sent/bin/activate
# 4. Install all Python dependencies
make install
# 5. Create a .env file (copy from .env.example if available)
cp .env.example .env
# 6. Run database migrations
make migrate
# 7. Start backend server for development
make backend-dev- App Review Scraping from Google Play Store and Apple App Store
- Sentiment Analysis (Positive, Neutral, Negative) using fine-tuned BERT models
- App Comparison between platforms
- Scheduled Retraining via Airflow DAGs
- Scraper Orchestration using Airflow
- Model Tracking using MLflow
- Monitoring and Alerting with Prometheus, Grafana, Email, Telegram
- Dockerized Services, Swarm Deployment ready
- External PostgreSQL Database for persistence and scaling
Initial data is stored inside:
datasets/
βββ raw/
β βββ initial_reviews_google.csv
β βββ initial_reviews_apple.csv
βββ processed/
β βββ labeled_reviews.csv
raw/: Raw scraped data (unlabeled)processed/: Auto-labeled or human-labeled sentiment data for model fine-tuning
- Database: No internal Postgres container is provided. Use a separate managed PostgreSQL server.
- Production Readiness: Use GCP Spot VMs for heavy model training; Hetzner CPU server for inference.
- Security: Set strong
.envsecrets for production. - Scaling: Ready for Docker Swarm clustering and Traefik load balancing.
# Setup local env
make set_env_dev
# Build images
make build
# Run services
make up
# Access services
- Backend API: http://localhost:8000
- Airflow Web UI: http://localhost:8080
- Grafana Dashboards: http://localhost:3000
- MLflow Tracking: http://localhost:5000MIT License - Feel free to use, modify, and contribute.