|
| 1 | +# Noteflow Backend (FastAPI) |
| 2 | + |
| 3 | +## Overview |
| 4 | +- FastAPI backend for Noteflow |
| 5 | +- OCR pipeline supports images, PDF, DOC/DOCX, HWP (via utilities and system tools) |
| 6 | + |
| 7 | +## Run (local) |
| 8 | +``` |
| 9 | +python -m venv .venv |
| 10 | +source .venv/bin/activate |
| 11 | +pip install -r requirements.txt |
| 12 | +uvicorn main:app --host 0.0.0.0 --port 8080 --reload |
| 13 | +``` |
| 14 | + |
| 15 | +Env (optional): |
| 16 | +- `SECRET_KEY`, `ACCESS_TOKEN_EXPIRE_MINUTES` |
| 17 | +- Database URLs if you connect a DB (current code uses provided models) |
| 18 | + |
| 19 | +## OCR system tools (optional but recommended) |
| 20 | +- PyMuPDF (Python) used by default for PDF text extraction |
| 21 | +- Optional fallbacks/tools: |
| 22 | + - Poppler (`pdftoppm`) for `pdf2image` |
| 23 | + - LibreOffice (`soffice`) for .doc → .pdf |
| 24 | + - `hwp5txt` for .hwp text extraction |
| 25 | +- If missing, the API still returns 200 with `warnings` explaining limitations. |
| 26 | + |
| 27 | +## API Highlights |
| 28 | +- `POST /api/v1/files/ocr` — OCR and create note (accepts file + optional `folder_id`, `langs`, `max_pages`) |
| 29 | +- `POST /api/v1/files/upload` — Upload files to folder |
| 30 | +- `POST /api/v1/files/audio` — STT from audio, create/append to note |
| 31 | + |
| 32 | +## CI (GitHub Actions) |
| 33 | +- This folder includes `.github/workflows/ci.yml` to lint/smoke-test on push/PR. |
| 34 | +- Python 3.11, `pip install -r requirements.txt`, syntax check and import smoke. |
| 35 | + |
| 36 | +## Docker (optional; for later) |
| 37 | +- Dockerfile included. Build & run locally: |
| 38 | +``` |
| 39 | +docker build -t noteflow-backend . |
| 40 | +docker run --rm -p 8080:8080 noteflow-backend |
| 41 | +``` |
| 42 | +- GitHub Actions container build: |
| 43 | + - `.github/workflows/docker.yml` pushes to GHCR: |
| 44 | + - `ghcr.io/<owner>/<repo>:backend-latest` |
| 45 | + - `ghcr.io/<owner>/<repo>:backend-<sha>` |
| 46 | +- Deployment example (SSH) once you’re ready: |
| 47 | +``` |
| 48 | +docker login ghcr.io -u <USER> -p <TOKEN> |
| 49 | +docker pull ghcr.io/<owner>/<repo>:backend-latest |
| 50 | +docker run -d --name backend --restart=always -p 8080:8080 ghcr.io/<owner>/<repo>:backend-latest |
| 51 | +``` |
| 52 | + |
| 53 | +## Notes |
| 54 | +- If you split this folder into its own repository root, the included `.github/workflows/*.yml` files will work as-is. |
| 55 | +- OCR uses model-first path (EasyOCR + TrOCR) and falls back to tesseract when available. |
0 commit comments