Python command-line tool for interacting with AI models through the OpenRouter API/Cloudflare AI Gateway, or local self-hosted Ollama. Optionally support Microsoft LLMLingua prompt token compression
-
Updated
Mar 31, 2025
Python command-line tool for interacting with AI models through the OpenRouter API/Cloudflare AI Gateway, or local self-hosted Ollama. Optionally support Microsoft LLMLingua prompt token compression
Compress LLM Prompts and save 80%+ on GPT-4 in Python
This repository is the official implementation of Generative Context Distillation.
Enhance the performance and cost-efficiency of large-scale Retrieval Augmented Generation (RAG) applications. Learn to integrate vector search with traditional database operations and apply techniques like prefiltering, postfiltering, projection, and prompt compression.
Add a description, image, and links to the prompt-compression topic page so that developers can more easily learn about it.
To associate your repository with the prompt-compression topic, visit your repo's landing page and select "manage topics."