Chonkie API 🦛✨

Chonkie API is a lightweight, high-performance REST API wrapper for Chonkie, the efficient RAG chunking library. It exposes multiple chunking strategies via a simple HTTP interface, making it easy to integrate robust text splitting into your RAG pipelines.

🚀 Features

Supports all major Chonkie chunkers:

TokenChunker: Split by token count.
RecursiveChunker: Split recursively by delimiters (no overlap).
SentenceChunker: Split by sentences.
SemanticChunker: Split based on semantic similarity (embeddings).
LateChunker: Late interaction chunking.
CodeChunker: Syntax-aware chunking for code (Python, etc.).
NeuralChunker: Transformer-based chunking.
SlumberChunker: LLM-based chunking (requires API key).

🛠️ Installation & Setup

Docker (Recommended)

The easiest way to run Chonkie API is using Docker.

Build the Image
```
docker build -t chonkie-api .
```
Run the Container
```
docker run -p 7859:7859 chonkie-api
```
The API will be available at http://localhost:7859.

Local Development

Install Dependencies
```
pip install -r requirements.txt
```

Run the Server

uvicorn app.main:app --host 0.0.0.0 --port 7859 --reload

📚 API Usage

Full interactive documentation (Swagger UI) is available at http://localhost:7859/docs when the server is running.

Example: Token Chunking

Input

curl -X POST "http://localhost:7859/chunk/token" \
     -H "Content-Type: application/json" \
     -d '{
           "text": "Chonkie is the goodest boi! My favorite chunking hippo hehe.",
           "chunk_size": 10,
           "chunk_overlap": 2
         }'

Output

{
  "chunks": [
    {
      "text": "Chonkie is the goodest boi! My favorite",
      "start_index": 0,
      "end_index": 39,
      "token_count": 10
    },
    {
      "text": "My favorite chunking hippo hehe.",
      "start_index": 28,
      "end_index": 60,
      "token_count": 7
    }
  ],
  "total_chunks": 2
}

Supported Endpoints

Endpoint	Description	Key Parameters
`/chunk/token`	Fixed-size token chunking	`chunk_size`, `chunk_overlap`, `tokenizer`
`/chunk/recursive`	Recursive character splitting	`chunk_size`, `rules`
`/chunk/sentence`	Sentence-based splitting	`chunk_size`, `min_sentences_per_chunk`
`/chunk/semantic`	Semantic similarity splitting	`chunk_size`, `embedding_model`, `similarity_threshold`
`/chunk/late`	Late chunking	`chunk_size`
`/chunk/code`	Code syntax splitting	`chunk_size`, `language`
`/chunk/neural`	Neural network splitting	`min_characters_per_chunk`, `model`
`/chunk/slumber`	LLM-based splitting	`chunk_size`, `api_key` (via env)

✅ Testing

Verification Script

Run the automated verification script to test all endpoints:

python test_api.py

Docker Integration Test

To verify the running Docker container:

python test_docker_full.py

📄 License

MIT License. See LICENSE for details.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
app		app
.dockerignore		.dockerignore
.gitignore		.gitignore
Dockerfile		Dockerfile
README.md		README.md
requirements.txt		requirements.txt
test_api.py		test_api.py
test_docker_full.py		test_docker_full.py
try-chonkie.ipynb		try-chonkie.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Chonkie API 🦛✨

🚀 Features

🛠️ Installation & Setup

Docker (Recommended)

Local Development

📚 API Usage

Example: Token Chunking

Supported Endpoints

✅ Testing

Verification Script

Docker Integration Test

📄 License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Chonkie API 🦛✨

🚀 Features

🛠️ Installation & Setup

Docker (Recommended)

Local Development

📚 API Usage

Example: Token Chunking

Supported Endpoints

✅ Testing

Verification Script

Docker Integration Test

📄 License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages