Skip to content

Latest commit

 

History

History

cache_augumeted_generation

Cache-Augmented Generation

Open In Colab

🔍 Overview

Retrieval-Augmented Generation (RAG) enhances language models by integrating external knowledge but faces challenges like retrieval latency, errors, and system complexity. Cache-Augmented Generation (CAG) addresses these by preloading relevant data into the model's context, leveraging modern LLMs' extended context windows and caching runtime parameters. This eliminates real-time retrieval during inference, enabling direct response generation.


✨ Advantages of CAG

  • Reduced Latency: Faster inference by removing real-time retrieval.
  • Improved Reliability: Avoids retrieval errors and ensures context relevance.
  • Simplified Design: Offers a streamlined, low-complexity alternative to RAG with comparable or better performance.

⚠️ Limitations of CAG

  • Knowledge Size Limits: Requires fitting all relevant data into the context window, unsuitable for extremely large datasets.
  • Context Length Issues: Performance may degrade with very long contexts.

📚 References


⚙️ Setup Instructions

  • Prerequisites

    • Python 3.9 or higher
    • pip (Python package installer)
  • Installation

    1. Clone the repository:

      git clone https://github.com/genieincodebottle/genaicodelab.git
      cd genaicodelab/cache_augumeted_generation
    2. Create a virtual environment:

      python -m venv venv
      venv\Scripts\activate # On Linux -> source venv/bin/activate
    3. Install dependencies:

      pip install torch --index-url https://download.pytorch.org/whl/cu118
      pip install -r requirements.txt
    4. Rename .env.example to .env

    5. Get your Hugging Face token:

    6. Copy the token to HF_TOKEN in your .env file


💻 Running the Application

To start the application, run:

streamlit run app.py
App