- ⚙️ Setup Instructions
- 💻 Running the Application
- 🔍 Overview
- ✨ Advantages of CAG
⚠️ Limitations of CAG- 📚 References
Retrieval-Augmented Generation (RAG) enhances language models by integrating external knowledge but faces challenges like retrieval latency, errors, and system complexity. Cache-Augmented Generation (CAG) addresses these by preloading relevant data into the model's context, leveraging modern LLMs' extended context windows and caching runtime parameters. This eliminates real-time retrieval during inference, enabling direct response generation.
- Reduced Latency: Faster inference by removing real-time retrieval.
- Improved Reliability: Avoids retrieval errors and ensures context relevance.
- Simplified Design: Offers a streamlined, low-complexity alternative to RAG with comparable or better performance.
- Knowledge Size Limits: Requires fitting all relevant data into the context window, unsuitable for extremely large datasets.
- Context Length Issues: Performance may degrade with very long contexts.
-
- Python 3.9 or higher
- pip (Python package installer)
-
-
Clone the repository:
git clone https://github.com/genieincodebottle/genaicodelab.git cd genaicodelab/cache_augumeted_generation
-
Create a virtual environment:
python -m venv venv venv\Scripts\activate # On Linux -> source venv/bin/activate
-
Install dependencies:
pip install torch --index-url https://download.pytorch.org/whl/cu118 pip install -r requirements.txt
-
Rename
.env.example
to.env
-
Get your Hugging Face token:
- Visit Hugging Face Tokens Page
- Create a new token with read access
-
Copy the token to
HF_TOKEN
in your .env file
-
To start the application, run:
streamlit run app.py