KlarText turns dense German or English text into easy-to-understand language and can read it aloud. It’s built for people who get overwhelmed by long sentences, legal/bureaucratic phrasing, or technical language.
Important: KlarText produces “easy language” / plain-language simplifications. It is not certified “Leichte Sprache” and does not guarantee legal/medical accuracy.
Phase 0: ✅ Complete (January 10, 2026)
| Component | Status | Details |
|---|---|---|
| API Core | ✅ Working | /v1/simplify endpoint with Groq LLM |
| PDF Extraction | ✅ Working | /v1/ingest/pdf with PyMuPDF |
| Gradio Demo | ✅ Working | Direct LLM & Via API modes |
| Prompt Templates | ✅ Working | German & English from prompts/templates/ |
| Quality Scoring | ✅ Working | LIX, sentence length, readability |
| Swagger Docs | ✅ Available | http://localhost:8000/docs |
| Batch Endpoint | 📝 Defined | Logic pending |
| TTS | 📝 Defined | Logic pending |
| URL Extraction | 📝 Defined | Logic pending |
| Frontend (Next.js) | ⏳ Planned | Will replace Gradio |
Quick Start:
# First time setup:
cd services/api
cp env.example .env
# Edit .env and add your GROQ_API_KEY
# Terminal 1: API
cd services/api && uvicorn app.main:app --reload --port 8000
# Terminal 2: Demo
cd demo && python app.py
# Open: http://localhost:7860See docs/phase_0_testing_guide.md for detailed setup instructions.
- Paste text → get an easy-to-read version (DE/EN)
- Upload a PDF → extract text → simplify it
- Optional Text-to-Speech (TTS) for the simplified text
- No login required (MVP)
Stretch (optional):
- Paste a URL → extract main article text → simplify
- Chrome extension: simplify selected text or the current page
flowchart LR
U[User] -->|Paste text<br/>Upload PDF<br/>Enter URL| WEB[Web UI<br/>Next.js]
WEB -->|REST| API[API<br/>FastAPI]
API --> INGEST[Ingestion]
INGEST --> CHUNK[Chunking]
CHUNK --> LLM[LLM Adapter]
LLM --> SIMP[Simplified Text]
SIMP -->|optional| TTS[TTS Provider]
API --> WEB
TTS --> API
Components:
- Ingestion — Extract text from PDF, pasted text, or URL
- Chunking — Split long texts, ensure no hallucinations
- LLM Adapter — Configurable provider (OpenAI, Azure, Groq, Google)
- TTS Provider — Text-to-speech for audio output
- Frontend: Next.js (React), TypeScript, Tailwind (or vanilla CSS), accessible UI patterns
- Backend: Python + FastAPI + Uvicorn
- LLM: configurable provider via an adapter layer
- PDF extraction: PyMuPDF or pdfplumber
- URL extraction (stretch): trafilatura or readability-lxml
- Rate limiting / caching (optional): Redis
- Storage (optional): Postgres (metrics/logging) + local file storage in dev
Base path: /v1
-
POST /v1/simplify- Input:
{ text, target_lang: "de"|"en", level: "very_easy"|"easy"|"medium" } - Output:
{ simplified_text, warnings?: string[] }
- Input:
-
POST /v1/ingest/pdf- Input:
multipart/form-datawithfile - Output:
{ extracted_text, pages, warnings?: string[] }
- Input:
-
POST /v1/ingest/url(stretch)- Input:
{ url } - Output:
{ extracted_text, title?, warnings?: string[] }
- Input:
-
POST /v1/tts- Input:
{ text, lang: "de"|"en" } - Output:
{ audio_url | audio_base64 }
- Input:
-
GET /healthz
Swagger UI: http://localhost:8000/docs (when API is running)
ReDoc: http://localhost:8000/redoc
OpenAPI JSON: http://localhost:8000/openapi.json
- Docker + Docker Compose
Copy the example env file:
cp .env.example .envThen set your LLM/TTS provider keys (see comments inside .env.example).
docker compose up --build- Web: http://localhost:3000
- API: http://localhost:8000
- API docs: http://localhost:8000/docs
apps/web– Web UIservices/api– API serviceapps/extension– Chrome extension (optional track)prompts– prompt templates + eval fixturesdocs– architecture notes + screenshots
Lovable prototype: https://lovable.dev/projects/7d82e6e7-3919-4389-9b79-30543806c5e0
Usage & Licensing Notice (Non-Commercial) This project, including all code, models, datasets, documentation, and related materials, is provided for personal, educational, and non-commercial research use only.
Allowed Personal use Academic and non-commercial research Forking and modifying for non-commercial purposes Contributing via pull requests Not Allowed Any commercial use Selling or monetizing the software, models, datasets, or outputs Using the project in products or services that generate revenue Use within commercial organizations without permission Attribution If you share or publish derivative work, you must provide credit to:
Based on work from the KlarText Team (2025).
Commercial Licensing To request commercial usage rights, contact a repository administrator.