NeuroLearn is a full-stack course recommendation system that uses a production-grade hybrid RAG pipeline — not basic cosine similarity. When a user describes what they want to learn, the system runs their query through ChromaDB dense vector search and BM25 sparse keyword search simultaneously, fuses the two result sets using Reciprocal Rank Fusion (RRF), reranks the top candidates with a cross-encoder model, and passes the best match to Qwen 2.5-14B to generate a personalised explanation. The result is a recommender that handles both semantic queries ("I want to understand how attention mechanisms work") and keyword queries ("React hooks tutorial") correctly — something neither dense-only nor sparse-only retrieval achieves alone.
User Query
│
▼
React Frontend (port 3000)
│ POST /output/ {text: query}
▼
FastAPI Backend (port 8000)
│
▼
┌─────────────────────────────────────────────┐
│ Hybrid RAG Retriever │
│ │
│ ┌──────────────────┐ ┌─────────────────┐ │
│ │ ChromaDB │ │ BM25Okapi │ │
│ │ Dense search │ │ Sparse search │ │
│ │ (BGE-small,top20)│ │ (rank_bm25,top20│ │
│ └────────┬─────────┘ └────────┬────────┘ │
│ │ │ │
│ └──────────┬──────────┘ │
│ ▼ │
│ RRF Fusion score = Σ 1/(rank+60) │
│ │ top-10 │
│ ▼ │
│ Cross-Encoder Reranker │
│ (ms-marco-MiniLM-L-6-v2) │
│ │ top-5 │
└──────────────────────┼───────────────────────┘
│
▼
Qwen 2.5-14B LLM (generation)
│
▼
{recommended_course, difficulty_level, llm_reasoning}
Dense retrieval (cosine similarity) captures semantic meaning but misses exact keyword matches — a query for "AWS EC2" may not retrieve courses titled "Amazon Elastic Compute Cloud" if the embeddings diverge. BM25 catches exact term matches but has zero semantic understanding, so "neural network tutorial" won't find a course titled "Deep Learning Fundamentals." RRF fusion gives each document credit from both lists, and the cross-encoder reranker then reads the full (query, course description) pair to make a final relevance decision — something neither single-vector distance nor keyword overlap can do.
Evaluated on a 20-question test set of realistic user learning goals (e.g. "I want to learn machine learning from scratch", "I need a course on cybersecurity") against the 999-course Coursera corpus.
| Metric | Hybrid RAG | Keyword Baseline | Improvement |
|---|---|---|---|
| context_precision | 0.9400 | 0.4100 | +129% |
| context_recall | 0.7375 | 0.4000 | +84% |
| answer_relevancy | 0.1606 | 0.0740 | +117% |
| faithfulness | 1.0000 | 1.0000 | — |
Note:
answer_relevancyandfaithfulnessare computed locally using statistical overlap methods. LLM-dependent variants (requiring GPT-4 as judge) needOPENAI_API_KEY. The retrieval metrics (context_precision,context_recall) are fully reliable and show the key result: 94% of retrieved courses are relevant vs 41% for keyword matching.
See backend/evaluation/ragas_results.json for per-question breakdown.
Embedding model: BAAI/bge-small-en-v1.5 (384-dim), benchmarked on 100 course titles, 5-run average, CPU-only.
| Runtime | Avg Latency | Std Dev | Speedup |
|---|---|---|---|
| PyTorch (sentence-transformers) | 563.8ms | ±24.9ms | baseline |
| ONNX Runtime | 495.2ms | ±20.4ms | 1.14× |
Cosine deviation: 0.0857 — caused by a pooling strategy difference (sentence-transformers uses CLS-token weighted pooling; ONNX wrapper uses mean pooling). Both produce valid semantic embeddings for retrieval. Matching pooling strategies would bring deviation below 0.001.
The ONNX model eliminates the PyTorch framework dependency entirely, enabling deployment on lightweight inference servers.
git clone https://github.com/Brijesh-Thakkar/NeuroLearn-Project.git
cd NeuroLearn-Project
cp .env.example .env
docker-compose up --build| Service | URL |
|---|---|
| React Frontend | http://localhost:3000 |
| FastAPI Backend | http://localhost:8000 |
| MLflow UI | http://localhost:5000 |
First run: ChromaDB will embed all 999 courses on startup (~2 minutes). Subsequent starts use the persisted
chroma_datavolume.
cd backend
python3 -m venv venv
source venv/bin/activate
pip install -r requirements.txt
# Download Qwen2.5-14B model (see README note below)
uvicorn app:app --reload --host 0.0.0.0 --port 8000Model downloads: On first startup,
BAAI/bge-small-en-v1.5(133MB) andcross-encoder/ms-marco-MiniLM-L-6-v2are auto-downloaded from HuggingFace. Qwen2.5-14B requires manual download: https://huggingface.co/mohitdeharkar/warp_Qwen2.5-14B-Instruct
cd frontend
npm install
npm start
# Opens http://localhost:3000| Notebook | Location | What it shows |
|---|---|---|
| RAGAS Evaluation | notebooks/ragas_evaluation.ipynb |
20-question retrieval quality benchmark with bar chart comparison vs keyword baseline |
| MLflow Tracking | notebooks/mlflow_tracking.ipynb |
Logs experiment params + metrics to SQLite; compares BGE-small vs BGE-base runs |
| ONNX Benchmark | notebooks/onnx_benchmark.ipynb |
Timing + memory benchmark for PyTorch vs ONNX Runtime embedding inference |
| Course EDA | notebooks/eda_courses.ipynb |
Full EDA: missing values, rating distributions, UMAP embedding visualization, retrieval quality comparison (precision@5 for BM25/dense/hybrid) |
| SQL Analytics | analytics/eda_queries.ipynb |
12 SQL query results visualized with matplotlib; mock data for offline rendering |
The analytics/ folder contains:
queries.sql— 12 production-ready SQL queries covering:RANK() OVER, cohort retention withdate_trunc + LAG, funnel drop-off withLEFT JOIN + IS NULL, rolling 7-day volume,PERCENTILE_CONTfor P50/P90/P99 latency, month-over-month growth, and keyword frequency analysis on user queries.schema_overview.md— Full ERD (ASCII) documenting all 3 tables with column types, foreign keys, and recommended extensions for analytics columns (created_at,response_ms,clicked).
Neurolearn-Proj/
│
├── backend/
│ ├── app.py Main FastAPI application (lifespan startup, /output/ endpoint)
│ ├── requirements.txt All Python dependencies
│ ├── Dockerfile Python 3.11-slim container
│ ├── mlflow_config.py MLflow tracking URI and experiment name constants
│ │
│ ├── rag/ Hybrid RAG pipeline
│ │ ├── embedder.py BAAI/bge-small-en-v1.5 via sentence-transformers
│ │ ├── onnx_embedder.py Same model exported to ONNX Runtime
│ │ ├── vector_store.py ChromaDB persistent collection + dense_search()
│ │ ├── bm25_retriever.py BM25Okapi sparse index + sparse_search()
│ │ ├── reranker.py cross-encoder/ms-marco-MiniLM-L-6-v2 reranker
│ │ └── hybrid_retriever.py RRF fusion + full pipeline orchestration
│ │
│ ├── evaluation/
│ │ ├── ragas_runner.py Standalone evaluation script (20 queries, saves JSON)
│ │ └── ragas_results.json Actual evaluation results (context_precision=0.94)
│ │
│ ├── tests/
│ │ └── test_rag.py pytest: result count, metadata keys, score ordering
│ │
│ └── app/ Original app (DB models, auth — unchanged)
│ ├── database/ PostgreSQL connection setup
│ └── models/ SQLAlchemy models (sqlusers, sqlcourses, userhistory)
│
├── frontend/ React frontend (unchanged from original)
│ ├── Dockerfile node:20-alpine container
│ └── src/ Components, pages, API calls
│
├── notebooks/ Jupyter notebooks
│ ├── ragas_evaluation.ipynb RAGAS benchmark results
│ ├── mlflow_tracking.ipynb MLflow experiment comparison
│ ├── onnx_benchmark.ipynb Inference timing benchmark
│ └── eda_courses.ipynb Full course dataset EDA
│
├── analytics/ SQL analytics layer
│ ├── queries.sql 12 complex PostgreSQL queries
│ ├── schema_overview.md ERD + column documentation
│ └── eda_queries.ipynb Visualizations of query results
│
├── mlflow/
│ └── README.md MLflow UI setup instructions
│
├── docker-compose.yml 4-service stack (backend, frontend, postgres, mlflow)
├── .env.example Required environment variables
└── README.md This file
| Component | Technology | Purpose |
|---|---|---|
| Frontend | React 18, React Router | User interface, search input, result display |
| Backend | FastAPI, Python 3.11, Uvicorn | REST API, lifespan startup, CORS |
| Vector DB | ChromaDB (persistent) | Dense embedding storage and ANN search |
| Dense Retrieval | BAAI/bge-small-en-v1.5 | 384-dim semantic embeddings |
| Sparse Retrieval | BM25Okapi (rank-bm25) | Exact term frequency matching |
| Reranking | cross-encoder/ms-marco-MiniLM-L-6-v2 | Pairwise query-document scoring |
| LLM Generation | Qwen 2.5-14B-Instruct (8-bit) | Natural language recommendation explanation |
| Experiment Tracking | MLflow + SQLite | Parameter/metric logging per retrieval experiment |
| Inference Optimization | ONNX Runtime (optimum) | 1.14× embedding speedup, no PyTorch dependency |
| Database | PostgreSQL 15, SQLAlchemy | User auth, course catalogue, query history |
| Containerization | Docker, docker-compose | One-command full-stack deployment |