Project: Text-to-SQL Platform (FastAPI + Groq)
Codebase root: /Users/a/Documents/DataScience_World/LLM_project/TextToSQLapp
The Text-to-SQL platform converts natural language questions into SQL, executes the SQL against an SQLite database (school.db), and exposes results via a REST API.
- Backend: FastAPI
- NL→SQL: Groq (OpenAI-compatible API)
- DB: SQLite (demo), path configurable via
.envor env varSQLITE_PATH
- Reliable API endpoints for health, data access, direct SQL execution, and NL→SQL.
- Reproducible local environment via Conda.
- Clear docs and demo script.
- Maintainable code with tests and useful coverage (≥75%).
- Full production-grade UI (future).
- RBAC/SSO and multi-tenant auth (future).
- Non-SQLite backends (future).
- Data analysts, educators, and developers needing quick NL→SQL.
- P0: Endpoints return correct shapes and sensible error messages.
- P0: Local tests pass with coverage ≥ 75%.
- P1: NL2SQL produces valid SQL for common demo questions.
- P1: p95 NL2SQL latency < 3s (network and LLM dependent).
- App entry:
backend/app/main.py - Routes:
backend/app/api/v1/routes.py - Config:
backend/app/core/config.py(Pydantic Settings) - Services:
backend/app/services/db.py(SQLite operations)nl2sql.py(Groq calls)
- Utils:
backend/app/utils/sql_cleaner.py - Models/schemas:
backend/app/models/schemas.py - Tests:
backend/tests/ - Demo:
backend/api_demo.py
- GET
/api/v1/health→{ "status": "ok" } - GET
/api/v1/students→ returns rows fromSTUDENTtable - POST
/api/v1/sqlwith{ "sql": "..." }→ returnsrowsorrowcount - POST
/api/v1/nl2sqlwith{ "question": "..." }→ returns{ "sql": "..." }
- Reliability: robust error handling for DB and LLM failures.
- Security:
.envignored; no secrets in logs; basic SQL cleaning. - Observability: structured logs.
- Performance: appropriate for SQLite; minimal overhead.
- Maintainability: typed Python, tests, organized modules.
Base: /api/v1
GET /health→ 200{ "status": "ok" }GET /students→ 200{ "rows": [[...], ...] }| 200{ "rows": [] }POST /sql→ 200{ "rows": [[...]] }or{ "rowcount": N }, 400 on invalid SQLPOST /nl2sql→ 200{ "sql": "SELECT ...;" }, 502 on LLM error
SQLite file: school.db
- Table:
STUDENT(NAME VARCHAR(25), CLASS VARCHAR(25), SECTION VARCHAR(25), MARKS INT)
backend/.env.example→ copy tobackend/.env- Vars:
GROQ_API_KEY(required for NL2SQL)GROQ_MODEL(defaultllama-3.1-70b-versatile)SQLITE_PATH(defaultschool.db)
- Declared in
backend/environment.ymlandbackend/pyproject.toml. - Key: fastapi, uvicorn, groq, pydantic, pydantic-settings, python-dotenv, pytest, pytest-cov, httpx, ruff.
- No secret commits.
- Clean LLM outputs to SQL (strip markdown, ensure semicolon).
- Future: schema allow-listing and stricter SQL validation.
- Unit/integration tests in
backend/tests/. - Temporary DB per test where needed; avoid mutating real
student.db. - Goal: keep coverage ≥ 75% (current ~79%).
- Logging set in
backend/app/core/logging.py. - Future: request IDs and tracing hooks.
- Local:
uvicorn app.main:app --reload --port 8000 --app-dir backend - Future: containerization and CI deploys.
- LLM hallucination → cleaning and guardrails; schema awareness later.
- Missing DB → seed script and clear errors.
- API key missing → error with actionable message.
- README divergence → maintain single source on main and use PRs.
- Phase 1: Backend stable (current)
- Phase 2: Frontend SPA
- Phase 3: Security hardening, schema introspection, Docker/CI
- W1: Stabilize backend, DB seeding, CI
- W2: Frontend prototype
- W3: Improve guardrails and prompts
- W4: Docker & release
- Endpoints function as spec’d
- Demo script returns rows and valid NL→SQL
- Tests pass locally/CI; coverage ≥ 75%
- README documents setup and usage
- Create Conda env (first time only):
conda env create -f backend/environment.yml # if env exists: conda env update -n text2sql-backend -f backend/environment.yml --prune conda activate text2sql-backend - Configure env vars:
cp backend/.env.example backend/.env # Edit backend/.env to set GROQ_API_KEY and optionally GROQ_MODEL and SQLITE_PATH
From repo root:
sqlite3 school.db <<'SQL'
CREATE TABLE IF NOT EXISTS STUDENT (
NAME VARCHAR(25),
CLASS VARCHAR(25),
SECTION VARCHAR(25),
MARKS INT
);
DELETE FROM STUDENT;
INSERT INTO STUDENT (NAME, CLASS, SECTION, MARKS) VALUES
('Alice','Data Science','A',85),
('Bob','Data Science','B',78),
('Charlie','AI','A',92),
('Diana','AI','B',88);
SQLIf you want a different path:
- Set
SQLITE_PATH=/absolute/path/to/school.dbinbackend/.env
uvicorn app.main:app --reload --port 8000 --app-dir backend
# Docs: http://127.0.0.1:8000/docscurl http://127.0.0.1:8000/api/v1/health
curl http://127.0.0.1:8000/api/v1/students
curl -X POST http://127.0.0.1:8000/api/v1/sql \
-H 'Content-Type: application/json' \
-d '{"sql":"SELECT COUNT(*) FROM STUDENT;"}'python backend/api_demo.pyRequires GROQ_API_KEY to be set in backend/.env.
cd backend
pytest -q --cov=app --cov-report=term-missingExpected: All tests pass, coverage ~79%.
- Branch from
main:git switch -c feature/<name> - Make changes; keep commits focused.
- Run tests locally.
- Push branch and open PR.
- Address review, squash/rebase as appropriate.
- Frontend SPA (React/Vue/Svelte) consuming
/api/v1 - AuthN/Z, rate limiting
- Schema introspection and allow-listed SQL
- Dockerfile + GitHub Actions CI
- Prompt tuning and fallback strategies for NL2SQL
- App:
backend/app/main.py - Routes:
backend/app/api/v1/routes.py - Services:
backend/app/services/ - Config:
backend/app/core/config.py - Tests:
backend/tests/ - Demo:
backend/api_demo.py - Environment:
backend/environment.yml
- "no such table: STUDENT": seed DB (see section B) or set
SQLITE_PATH. - 502 from
/nl2sql: ensureGROQ_API_KEYis set and network available. - Port in use:
lsof -ti:8000 | xargs -r kill -9then restart.