Skip to content

Krixna-Kant/Mini-Doc-Validator

Repository files navigation

Mini Document Validator — Genoshi Backend + AI Challenge

Python FastAPI Docker Tests License

A FastAPI microservice that automates insurance document validation using Google’s Gemini API for structured information extraction.
This challenge simulates a real-world backend + AI integration task involving document parsing, data validation, and API design.


Features

AI-Powered Extraction — Uses Gemini’s JSON mode for precise, structured data parsing.
Business Rule Validation — Implements validation for:

  • Date Consistency
  • Value Check
  • Vessel Name Match
  • Completeness Check

Typed Data Models — Strict Pydantic models for input/output validation.
Comprehensive Testing — Includes pytest unit tests for validation logic and API routes.
Clean Code Quality — Follows best practices with black formatting and ruff linting.
Dockerized Setup — Production-ready Dockerfile for containerized deployment.


Tech Stack

Component Description
Backend Framework FastAPI
AI Integration Google Gemini API (gemini-1.5-flash-latest)
Data Models Pydantic
Environment Management python-dotenv
Testing Pytest
Linting/Formatting Ruff + Black
Containerization Docker

Setup Instructions

1️⃣ Clone the Repository

git clone https://github.com/<your-username>/mini-document-validator.git
cd mini-document-validator

2️⃣ Create Virtual Environment

python -m venv venv
venv\Scripts\activate  # Windows
# or
source venv/bin/activate  # Mac/Linux

3️⃣ Install Dependencies

pip install -r requirements.txt

4️⃣ Configure Environment Variables

Create a .env file in the root directory:

GEMINI_API_KEY=your_gemini_api_key_here

Run the Application Locally

Start the FastAPI app:

uvicorn main:app --reload

The API will be available at: http://127.0.0.1:8000/docs

Use the built-in Swagger UI to test /validate endpoint.

Example API Usage

Endpoint

POST /validate

Request Body

{
  "text": "This is the raw text of an insurance document..."
}

Sample Successful Response

{
  "extracted_data": {
    "policy_number": "HM-2025-10-A4B",
    "vessel_name": "MV Neptune",
    "policy_start_date": "2025-11-01",
    "policy_end_date": "2026-10-31",
    "insured_value": 5000000
  },
  "validation_results": [
    {"rule": "Date Consistency", "status": "PASS", "message": "Policy end date is after start date."},
    {"rule": "Value Check", "status": "PASS", "message": "Insured value is valid."},
    {"rule": "Vessel Name Match", "status": "PASS", "message": "Vessel 'MV Neptune' is on the approved list."},
    {"rule": "Completeness Check", "status": "PASS", "message": "Policy number is present."}
  ]
}

🐳 Run with Docker

Build the Image

docker build -t mini-doc-validator .

Run the Container

docker run -p 8000:8000 mini-doc-validator

Access the API at: 👉 http://localhost:8000/docs

Testing

Run Unit Tests

pytest

Run Linter & Formatter

ruff check .
black .

Project Structure

mini_doc_validator/
│
├── main.py                  # FastAPI entry point
├── ai_extractor.py          # Gemini API integration
├── validation.py            # Business rule validation logic
├── models.py                # Pydantic data models
│
├── assets/
│   └── valid_vessels.json   # Reference list for vessel names
├── sample_document_pass.txt # Should pass all validations
├── sample_document_fail.txt # Should fail multiple validations
│
├── tests/                   # Unit tests
│   ├── test_validation.py
│   └── test_api.py
│
├── requirements.txt
├── Dockerfile
├── .env (not committed)
└── README.md

Example Dockerfile

FROM python:3.11.9

WORKDIR /app
COPY requirements.txt .

RUN pip install --no-cache-dir -r requirements.txt

COPY . .

EXPOSE 8000

CMD ["uvicorn", "main:app", "--host", "0.0.0.0", "--port", "8000"]

Validation Rules Summary

Rule Logic PASS Condition
Date Consistency policy_end_date > policy_start_date End date is later
Value Check insured_value > 0 Positive value
Vessel Name Match Vessel in valid_vessels.json Exists in list
Completeness Check policy_number not null or empty Present

Example Documents

File Description
sample_document_pass.txt Passes all validation rules
sample_document_fail.txt Fails multiple validation checks
valid_vessels.json Contains approved vessel list

Live Demo

Live API Link

Live Validate API Link

Final Submission Link

GitHub Repository: https://github.com/Krixna-Kant/Mini-Doc-Validator

Author

Krishna [[email protected]]

Backend + AI Intern Candidate | Passionate about AI-driven automation

📧 [email protected] 🌍 www.linkedin.com/in/krishna-kant19

Acknowledgements


🪶 What Makes This README Stand Out

  • ✅ Clean formatting with emojis and tables
  • ✅ Includes clear setup + Docker + testing instructions
  • ✅ Uses professional tone suitable for hackathons or internship submissions
  • ✅ Easy for recruiters or judges to navigate

Built with ❤️ using FastAPI and Google Gemini AI

About

AI-driven document validation API built with FastAPI, Gemini JSON mode, and Pydantic models.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published