This project implements a production-minded Retrieval-Augmented Generation (RAG) system that combines dense vector search with graph-based context enrichment. The core focus is on building a scalable, maintainable, and observable system using robust software engineering patterns, rather than solely fine-tuning RAG quality metrics.
This repository represents a foundational slice of a larger, personal application framework. The primary goal here was to prototype and validate a hybrid Graph+Vector RAG architecture using production-grade patterns (observability, dependency injection, lifecycle management).
The advanced RAG evaluation and metric optimization aspects are intentionally separated into a dedicated project to manage complexity. The patterns and services validated here are intended to be merged back into the main framework upon completion.
The system is built upon a set of engineering principles inspired by the "Fortify" pattern, which prioritizes production readiness from day one. The key architectural decisions are documented in the Architectural Decision Records (ADRs).
- Protocol-Oriented & Vendor Agnostic : Core components like data stores are defined by abstract protocols (
GraphStoreProtocol,VectorStoreProtocol). This decouples the business logic from concrete implementations (e.g., Neo4j, Weaviate), making it easy to swap technologies without major refactoring. - Managed Service Lifecycle : All stateful services (databases, caches, etc.) adhere to a
BaseServicecontract withstart()andstop()methods. A centrallifespanmanager orchestrates the application startup and shutdown gracefully. - Full-Stack Observability: The system is instrumented for deep visibility out-of-the-box, with distributed tracing, structured logging, and a complete monitoring stack.
- Developer Experience First: A streamlined setup and automated quality gates ensure that developers can contribute safely and efficiently.
The system follows a hybrid retrieval strategy to provide rich, contextual answers for complex queries.
Query Flow:
- A user query is received by the FastAPI endpoint.
- The
RetrievalServiceembeds the query into a vector. - Dense Retrieval: The vector is used to perform a similarity search in Weaviate to find the top-k relevant documents.
- Entity Extraction: Entities are extracted from the retrieved documents.
- Graph Expansion: These entities are used as seeds to traverse the Neo4j graph, discovering related entities and concepts within a configured number of hops.
- Fusion & Re-ranking: The initial documents are re-ranked, boosting the scores of those related to the expanded graph context.
- The enriched and re-ranked context is returned with citations.
For detailed architectural decisions, please see:
- ADR-001: Adoption of Hybrid RAG Architecture
- ADR-002: Adoption of Design Patterns
- ADR-003: Selection of Vector and Graph Databases
- Poetry
- Docker and Docker Compose
The setup.sh script will create a virtual environment, install dependencies, and set up pre-commit hooks.
# Give execution permission to scripts
chmod +x bin/*.sh
# Run the setup script
./bin/setup.shThis command starts the Neo4j and Weaviate containers in the background.
# The '-d' flag runs the containers in detached mode
docker compose -f infra/docker-compose.all.yml up -dRun the ingestion script to populate the databases with synthetic data.
# Activate the virtual environment
source .venv/bin/activate
# Run the ingestion pipeline
python -m src.graph.ingestion.executorStart the FastAPI application using Uvicorn.
uvicorn src.graph.api.main:app --host 0.0.0.0 --port 8000 --reloadThe API will be available at http://localhost:8000.
This script sends a series of predefined queries to the API to calculate baseline performance and quality metrics.
python -m src.graph.app.evalThe project includes a complete, containerized monitoring stack.
- Prometheus:
http://localhost:9090(Collects metrics) - Grafana:
http://localhost:3000(For dashboards; user/pass:admin/admin) - Alertmanager:
http://localhost:9093(Manages alerts)
To start the monitoring stack:
docker compose -f monitoring/docker-compose.yml up -dThe architecture was designed with scalability in mind.
- Database Scaling: Both Weaviate and Neo4j are configured to run in Docker but can be swapped for managed, horizontally scalable cloud clusters. The protocol-based design ensures this requires no application code changes.
- Multi-Tenancy: The
ExecutionContextsystem is the foundation for tenant isolation. It can be used to automatically filter queries and data at the storage layer. - Caching: A distributed cache like Redis could be easily introduced in front of the
RetrievalServiceto cache responses for common queries, drastically reducing latency. - Stateless API: The API is stateless, allowing it to be scaled horizontally behind a load balancer.
Contributions are welcome! Please read our CONTRIBUTING.md guide to get started. All interactions with the project are governed by our Code of Conduct.
This project is licensed under the Apache-2.0 License. See the LICENSE file for details.
© Guandaline 2025