Skip to content

zieen/chonkie-container

Repository files navigation

Chonkie API 🦛✨

Chonkie API is a lightweight, high-performance REST API wrapper for Chonkie, the efficient RAG chunking library. It exposes multiple chunking strategies via a simple HTTP interface, making it easy to integrate robust text splitting into your RAG pipelines.

🚀 Features

Supports all major Chonkie chunkers:

  • TokenChunker: Split by token count.
  • RecursiveChunker: Split recursively by delimiters (no overlap).
  • SentenceChunker: Split by sentences.
  • SemanticChunker: Split based on semantic similarity (embeddings).
  • LateChunker: Late interaction chunking.
  • CodeChunker: Syntax-aware chunking for code (Python, etc.).
  • NeuralChunker: Transformer-based chunking.
  • SlumberChunker: LLM-based chunking (requires API key).

🛠️ Installation & Setup

Docker (Recommended)

The easiest way to run Chonkie API is using Docker.

  1. Build the Image

    docker build -t chonkie-api .
  2. Run the Container

    docker run -p 7859:7859 chonkie-api

    The API will be available at http://localhost:7859.

Local Development

  1. Install Dependencies

    pip install -r requirements.txt
  2. Run the Server

    uvicorn app.main:app --host 0.0.0.0 --port 7859 --reload

📚 API Usage

Full interactive documentation (Swagger UI) is available at http://localhost:7859/docs when the server is running.

Example: Token Chunking

Input

curl -X POST "http://localhost:7859/chunk/token" \
     -H "Content-Type: application/json" \
     -d '{
           "text": "Chonkie is the goodest boi! My favorite chunking hippo hehe.",
           "chunk_size": 10,
           "chunk_overlap": 2
         }'

Output

{
  "chunks": [
    {
      "text": "Chonkie is the goodest boi! My favorite",
      "start_index": 0,
      "end_index": 39,
      "token_count": 10
    },
    {
      "text": "My favorite chunking hippo hehe.",
      "start_index": 28,
      "end_index": 60,
      "token_count": 7
    }
  ],
  "total_chunks": 2
}

Supported Endpoints

Endpoint Description Key Parameters
/chunk/token Fixed-size token chunking chunk_size, chunk_overlap, tokenizer
/chunk/recursive Recursive character splitting chunk_size, rules
/chunk/sentence Sentence-based splitting chunk_size, min_sentences_per_chunk
/chunk/semantic Semantic similarity splitting chunk_size, embedding_model, similarity_threshold
/chunk/late Late chunking chunk_size
/chunk/code Code syntax splitting chunk_size, language
/chunk/neural Neural network splitting min_characters_per_chunk, model
/chunk/slumber LLM-based splitting chunk_size, api_key (via env)

✅ Testing

Verification Script

Run the automated verification script to test all endpoints:

python test_api.py

Docker Integration Test

To verify the running Docker container:

python test_docker_full.py

📄 License

MIT License. See LICENSE for details.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors