diff --git a/notebooks/llm-rag-ov-langchain/README.md b/notebooks/llm-rag-ov-langchain/README.md new file mode 100644 index 00000000000..5888563fb79 --- /dev/null +++ b/notebooks/llm-rag-ov-langchain/README.md @@ -0,0 +1,197 @@ +# RAG Performance & Fairness Evaluation Toolkit (OpenVINO + LangChain) + +This toolkit enables developers to build, evaluate, and optimize Retrieval-Augmented Generation (RAG) applications with comprehensive quality metrics including accuracy, bias detection, and perplexity analysis plus a racial-bias indicator. This uses RAG pipeline optimized with Intel OpenVINO for enhanced performance on CPU, GPU, and NPU. The pipeline leverages: +- Optimum-Intel’s `OVModelForCausalLM` with the OpenVINO backend for efficient inference. +- LangChain for orchestration of document loading, chunking, embedding, retrieval, reranking, and generation. + +> Goal: Provide a portable notebook-driven workflow for rapid experimentation, model comparison, and validation of RAG systems on custom/private corpora. + +--- + +## 1. What Is RAG? + +Retrieval-Augmented Generation combines: +1. Retrieval: Selecting the most relevant context snippets from a document store. +2. Generation: Supplying those snippets to an LLM to produce grounded answers. + +Benefits: +- Injects up-to-date and domain-specific knowledge without fine-tuning the LLM. +- Reduces hallucinations by constraining generation to retrieved evidence. +- Supports compliance and audit by exposing sources (metadata) for each answer. + +--- + +## 2. RAG Performance & Fairness Evaluation Toolkit Overview + +| Component | Role | +|--------------------------|------| +| Document Loaders | Ingest local files (.pdf, .txt, .docx, .json, .csv) or URLs/web pages. | +| Text Splitter | Chunk documents into semantically sized pieces for embedding. | +| Embedding Model | Converts chunks to vector representations for similarity search. | +| Vector Store / Index | Persists embeddings enabling fast approximate or exact nearest-neighbor retrieval. | +| (Optional) Reranker | Re-orders retrieved candidates for improved answer grounding. | +| Generator (OVModel) | Runs local accelerated LLM inference via OpenVINO. | +| Evaluator | Computes quality and bias metrics. | +| Notebook Orchestrator | Step-by-step cells show the entire flow and allow interactive parameter tuning. | + +--- + +## 3. Key Features + +- **OpenVINO Model Optimization**: + - Hardware-accelerated inference using OpenVINO for LLMs and embedding models +- **Flexible Model Support**: + - LLM: Microsoft Phi-3-mini-4k-instruct (easily swappable with other HuggingFace models) + - Embeddings: BGE-small-en-v1.5 (supports other embedding models) + - Evaluation: Llama-2-7B for perplexity scoring +- **Advanced Retrieval**: + - ChromaDB vector store with persistent storage + - FlashRank reranking for improved retrieval accuracy + - Batch embedding insertion for large document sets +- **Multiple Document Sources**: + - Web scraping from sitemaps and URLs + - Local file loading (.pdf, .txt, .docx, .csv, .json, .xlsx) + - Supports both single and bulk document processing +- **Comprehensive Evaluation Metrics**: + - BLEU Score: Translation quality metric + - ROUGE Score: Summary quality assessment + - BERT Score: Semantic similarity using BERT embeddings + - Perplexity: Language model confidence measurement + - Diversity Score: Response variety analysis + - Racial Bias Detection: Using hate-speech detection model + +--- + +## 4. Installation + +```bash +# Clone the repository +cd RAG-OV-Langchain +pip install -r requirements.txt +``` + +(If OpenVINO runtime prerequisites are not already satisfied, follow Intel’s OpenVINO setup instructions.) + +--- + +## 5. Running the Notebook + +1. Launch Jupyter: `jupyter notebook` +2. Open the provided notebook - `ov_rag_evaluator.ipynb` +3. Execute cells in order; each cell includes explanatory comments. +4. Provide input sources (file paths or URLs) when prompted. +5. Adjust parameters such as: + - Chunk size / overlap + - Embedding model name + - Retrieval top-k + - Reranker toggle + - Generation temperature / max tokens +6. Run evaluation cells to view metrics dashboard output. + +--- + +## 6. Input / Output Formats + +### Supported Input +- Textual documents: `.pdf`, `.txt`, `.docx`, `.json`, `.csv` +- Web content: Page URLs (scraped & cleaned) +- (Extendable) Additional loaders can be registered for other data types. + +### Output +- Generated answer grounded in retrieved context. +- List of source chunks with: + - Document identifier + - Chunk index + - Similarity / relevance score + - Optional rerank score +- Metrics report (per query or aggregate). + +--- + +## 7. Evaluation Metrics + +| Metric | Purpose | +|---------------|---------| +| BERTScore | Semantic similarity vs. reference answer(s). | +| BLEU | n-gram precision (machine translation heritage; still indicative for overlap). | +| ROUGE | Recall-oriented overlap (useful for summarization-style references). | +| Perplexity | Fluency measure of generated text under a language model. | +| Racial Bias Indicator | Heuristic or embedding-based measure identifying disproportionate associations or skewed outputs. | + +Notes: +- Provide one or more reference answers (gold annotations) for BLEU/ROUGE/BERTScore. +- Perplexity may rely on a reference language model distinct from the generator. +- Bias indicator may leverage word association tests or sentiment differentials; interpret conservatively. + +--- + +## 8. Racial Bias Indicator (Concept) + +The notebook computes a racial bias signal that can highlight when generated answers: +- Over-index on certain demographic terms. +- Exhibit asymmetric sentiment or descriptors. +- Associate professions or attributes disproportionately. + +Recommended usage: +- Treat as a screening heuristic. +- Follow up with manual review. +- Do not treat a single numeric score as definitive. + +--- + +## 9. Customization + +You can modify: +- Embedding backend (e.g., `sentence-transformers`, `text-embedding-*` models). +- Retrieval strategy (FAISS, chroma, or other vector stores). +- Reranking (e.g., cross-encoder or LLM-based rerank). +- Generation model (swap Hugging Face model; ensure OpenVINO export or optimization). +- Metric thresholds for acceptance gating. + +--- + +## 10. Suggested Workflow + +1. Curate domain corpus. +2. Run baseline RAG with default parameters. +3. Collect queries & gold references (if available). +4. Evaluate metrics; record baseline. +5. Iterate: + - Tune chunking, top-k. + - Introduce reranker. + - Switch embedding model. + - Optimize LLM (quantization, OpenVINO optimizations). +6. Compare metric deltas; choose best configuration for deployment. + +--- + +## 11. Performance Considerations + +- OpenVINO accelerates inference on Intel hardware (CPU / GPU / NPU where supported). +- Smaller embedding models may trade slight recall for speed. +- Reranking adds latency; enable only if precision gains matter. +- Batch queries in evaluation phase to amortize setup costs. + +--- + +## 12. Limitations + +- Metrics may not fully capture factual grounding; consider human review. +- Bias indicator is heuristic; deeper audits require specialized tools. +- Long documents may need advanced chunking strategies (semantic splitting). +- URL ingestion quality depends on HTML cleanliness. + +--- + +## FAQs + +Q: Can I use a different LLM? +A: Yes, replace the checkpoint and ensure OpenVINO optimization/export steps are applied. + +Q: Do I need gold answers? +A: For BLEU/ROUGE/BERTScore, yes. For exploratory retrieval quality, you can still inspect sources without them. + +Q: How to reduce hallucinations? +A: Increase retrieval relevance (tune embeddings, use reranking) and constrain generation parameters (lower temperature). + +--- diff --git a/notebooks/llm-rag-ov-langchain/ov_rag_evaluator.ipynb b/notebooks/llm-rag-ov-langchain/ov_rag_evaluator.ipynb new file mode 100644 index 00000000000..a71e6ac6a89 --- /dev/null +++ b/notebooks/llm-rag-ov-langchain/ov_rag_evaluator.ipynb @@ -0,0 +1,763 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "id": "7722a495", + "metadata": {}, + "source": [ + "# RAG Performance & Fairness Evaluation Toolkit (OpenVINO + LangChain)\n", + "\n", + "This notebook demonstrates how to build and evaluate a Retrieval-Augmented Generation (RAG) pipeline using OpenVINO™ for accelerated performance on Intel hardware. We will use Hugging Face and LangChain libraries to construct the pipeline.\n", + "\n", + "The process involves:\n", + "1. **Environment Setup**: Installing necessary libraries.\n", + "2. **LLM and Tokenizer Setup**: Loading a language model (Microsoft's Phi-3-mini) and its tokenizer, optimized with OpenVINO.\n", + "3. **Embedding Model Setup**: Preparing an embedding model to convert text into vector representations.\n", + "4. **Data Loading and Processing**: Fetching documents from a web source, splitting them into manageable chunks, and creating vector embeddings.\n", + "5. **Vector Store and Retriever Setup**: Storing the embeddings in a ChromaDB vector store and setting up a retriever with reranking for improved accuracy.\n", + "6. **Building the RAG Chain**: Creating a `RetrievalQA` chain that combines the retriever and the LLM.\n", + "7. **Running the RAG Pipeline**: Asking a question to get a response from the RAG system.\n", + "8. **Evaluation**: Using a comprehensive `OpenVINORAGEvaluator` to assess the quality of the generated response based on various metrics like BLEU, ROUGE, BERTScore, perplexity, and bias." + ] + }, + { + "cell_type": "markdown", + "id": "81a21a14", + "metadata": {}, + "source": [ + "## 1. Environment Setup\n", + "\n", + "First, let's ensure all the required Python packages are installed. The following commands handle the installation of essential libraries. These are typically only needed if you encounter version conflicts or issues with existing installations." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "c4a2dc6a-3d3e-4da2-902f-30f3cbd24b39", + "metadata": {}, + "outputs": [], + "source": [ + "import os\n", + "import requests\n", + "from pathlib import Path\n", + "\n", + "if not Path(\"notebook_utils.py\").exists():\n", + " r = requests.get(\n", + " url=\"https://raw.githubusercontent.com/openvinotoolkit/openvino_notebooks/latest/utils/notebook_utils.py\",\n", + " )\n", + " with open(\"notebook_utils.py\", \"w\") as f:\n", + " f.write(r.text)\n", + "\n", + "if not Path(\"pip_helper.py\").exists():\n", + " r = requests.get(\n", + " url=\"https://raw.githubusercontent.com/openvinotoolkit/openvino_notebooks/latest/utils/pip_helper.py\",\n", + " )\n", + " open(\"pip_helper.py\", \"w\").write(r.text)\n", + "\n", + "from pip_helper import pip_install\n", + "\n", + "os.environ[\"GIT_CLONE_PROTECTION_ACTIVE\"] = \"false\"\n", + "\n", + "pip_install(\"--pre\", \"-U\", \"openvino>=2025.3.0\", \"--extra-index-url\", \"https://storage.openvinotoolkit.org/simple/wheels/nightly\")\n", + "pip_install(\"--pre\", \"-U\", \"openvino-tokenizers\", \"--extra-index-url\", \"https://storage.openvinotoolkit.org/simple/wheels/nightly\")\n", + "pip_install(\n", + " \"--extra-index-url\",\n", + " \"https://download.pytorch.org/whl/cpu\",\n", + " \"--upgrade-strategy\",\n", + " \"eager\",\n", + " \"optimum[openvino,nncf,onnxruntime]\",\n", + " \"sacrebleu\",\n", + " \"rouge-score\",\n", + " \"nncf>=2.18.0\",\n", + " \"bert-score\",\n", + " \"transformers\",\n", + " \"onnx\",\n", + " \"nltk\",\n", + " \"numpy\",\n", + " \"textblob\",\n", + " \"dataset\",\n", + " \"langchain\",\n", + " \"langchain_community\",\n", + " \"chromadb\",\n", + " \"langchain-chroma\",\n", + " \"langchain-huggingface\",\n", + " \"sentence-transformers\",\n", + " \"Flashrank\",\n", + " \"msoffcrypto-tool\",\n", + " \"docx2txt\",\n", + " \"bs4\",\n", + " \"python-docx\",\n", + " \"huggingface-hub>=0.26.5\",\n", + ")" + ] + }, + { + "cell_type": "markdown", + "id": "8a005fb2", + "metadata": {}, + "source": [ + "## 2. LLM and Tokenizer Setup\n", + "\n", + "Next, we load the Large Language Model (LLM) and its corresponding tokenizer. We use `optimum-intel` to convert and accelerate the model with OpenVINO. In this example, we use `microsoft/Phi-3-mini-4k-instruct`, but you can replace it with another compatible model.\n", + "\n", + "- **`OVModelForCausalLM`**: Loads a causal language model and automatically converts it to the OpenVINO format (`export=True`).\n", + "- **`device=\"GPU\"`**: Specifies that the model should run on the integrated GPU for acceleration. You can change this to `\"CPU\"`." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "90e68d95-9a4e-4ba5-9040-4422c1333444", + "metadata": {}, + "outputs": [], + "source": [ + "from optimum.intel import OVModelForCausalLM\n", + "from transformers import AutoTokenizer, pipeline\n", + "from langchain_huggingface import HuggingFacePipeline\n", + "\n", + "# Load model with OpenVINO backend\n", + "model = OVModelForCausalLM.from_pretrained(\n", + " \"microsoft/Phi-3-mini-4k-instruct\", # You can plug in any other supported model\n", + " export=True, # Convert to OpenVINO format on the fly\n", + " device=\"GPU\" # Specify GPU for inference, can also be \"CPU\"\n", + ")\n", + "\n", + "tokenizer = AutoTokenizer.from_pretrained(\"microsoft/Phi-3-mini-4k-instruct\")\n", + "model.save_pretrained(\"ov_model\")" + ] + }, + { + "cell_type": "markdown", + "id": "27419145", + "metadata": {}, + "source": [ + "### Create a LangChain-compatible LLM Pipeline\n", + "\n", + "We now create a `text-generation` pipeline using the OpenVINO-optimized model and tokenizer. This pipeline is then wrapped in `HuggingFacePipeline` to make it compatible with the LangChain ecosystem. A quick test is run to confirm the pipeline is working correctly." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "f985ca28-9e3d-490d-954c-71b24fc47eda", + "metadata": {}, + "outputs": [], + "source": [ + "# Create a text-generation pipeline with the OpenVINO model\n", + "llm_pipeline = pipeline(\n", + " \"text-generation\",\n", + " model=model,\n", + " tokenizer=tokenizer,\n", + " device=model.device,\n", + " max_new_tokens=100,\n", + " top_k=50,\n", + " temperature=0.1,\n", + " do_sample=True\n", + ")\n", + "\n", + "# Create a LangChain instance from the Hugging Face pipeline\n", + "llm = HuggingFacePipeline(pipeline=llm_pipeline)\n", + "\n", + "# Test the pipeline with a sample query\n", + "response = llm.invoke(\"What is an ocean?\")\n", + "print(response)" + ] + }, + { + "cell_type": "markdown", + "id": "05fee9cc", + "metadata": {}, + "source": [ + "## 3. Embedding Model Setup\n", + "\n", + "For the retrieval part of our RAG pipeline, we need an embedding model to convert text documents into numerical vectors. We use `OpenVINOBgeEmbeddings` from `langchain_community`, which provides OpenVINO-optimized embeddings for efficient performance. Here, we use the `bge-small-en-v1.5` model." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "606ff70a-f797-42a5-a697-8cb5c13c0dae", + "metadata": {}, + "outputs": [], + "source": [ + "from langchain_community.embeddings import OpenVINOBgeEmbeddings\n", + "from sentence_transformers import SentenceTransformer\n", + "import os\n", + "\n", + "# First time: Download and save the model\n", + "embedding_model_name = \"BAAI/bge-small-en-v1.5\" # Full HF repo path\n", + "save_directory = \"./saved_bge_model\"\n", + "\n", + "# Download the model using SentenceTransformer directly\n", + "st_model = SentenceTransformer(embedding_model_name)\n", + "st_model.save(save_directory)\n", + "print(f\"Model saved to {save_directory}\")\n", + "\n", + "# Now create the OpenVINO embedding with the saved model\n", + "embedding = OpenVINOBgeEmbeddings(\n", + " model_name_or_path=save_directory, # Use saved path\n", + " model_kwargs={\"device\": \"CPU\"},\n", + " encode_kwargs={\"normalize_embeddings\": True},\n", + ")\n", + "\n", + "# Load the saved model from local directory\n", + "local_model_path = \"./saved_bge_model\"\n", + "\n", + "embedding = OpenVINOBgeEmbeddings(\n", + " model_name_or_path=local_model_path,\n", + " model_kwargs={\"device\": \"CPU\"},\n", + " encode_kwargs={\"normalize_embeddings\": True},\n", + ")\n", + "\n", + "# Test the loaded model\n", + "text = \"This is a test document.\"\n", + "embedding_result = embedding.embed_query(text)\n", + "print(\"Sample embedding (first 3 dimensions):\", embedding_result[:3])" + ] + }, + { + "cell_type": "markdown", + "id": "f8defdcb", + "metadata": {}, + "source": [ + "## 4. Data Loading and Processing\n", + "\n", + "Now we'll load the documents that will form the knowledge base for our RAG pipeline. This notebook includes two methods for loading documents:\n", + "\n", + "1. **Web Crawling (Enabled by default)**: Fetches content from a website's sitemap. We use `WebBaseLoader` to load content from URLs found in the sitemap of Zerodha Varsity.\n", + "2. **Local File Loading (Commented out)**: A robust `LangChainDocumentLoader` class is provided to load various file types (`.txt`, `.pdf`, `.docx`, etc.) from a local directory. You can uncomment and adapt this section if you want to use your own local files." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "808e9c2d-ab4a-4bc6-bb45-f3b2d4be3156", + "metadata": {}, + "outputs": [], + "source": [ + "import bs4\n", + "from urllib.request import Request, urlopen\n", + "from bs4 import BeautifulSoup\n", + "import ssl\n", + "from langchain_community.document_loaders import WebBaseLoader\n", + "'''\n", + "# --- Method 1: Load documents by crawling a web page (default) ---\n", + "def get_sitemap(url):\n", + " \"\"\"Fetches and parses an XML sitemap from a URL.\"\"\"\n", + " req = Request(url, headers={\"User-Agent\": \"Mozilla/5.0\"})\n", + " response = urlopen(req)\n", + " xml = BeautifulSoup(response, \"lxml-xml\", from_encoding=response.info().get_param(\"charset\"))\n", + " return xml\n", + "\n", + "def get_urls_from_sitemap(xml):\n", + " \"\"\"Extracts all URLs from a parsed sitemap XML.\"\"\"\n", + " urls = [loc.text for loc in xml.find_all(\"loc\")]\n", + " return urls\n", + "\n", + "# Bypass SSL verification issues if they arise\n", + "ssl._create_default_https_context = ssl._create_stdlib_context\n", + "\n", + "sitemap_url = \"https://zerodha.com/varsity/chapter-sitemap2.xml\"\n", + "sitemap_xml = get_sitemap(sitemap_url)\n", + "urls = get_urls_from_sitemap(sitemap_xml)\n", + "\n", + "# Load documents from the collected URLs\n", + "docs = []\n", + "for i, url in enumerate(urls):\n", + " try:\n", + " loader = WebBaseLoader(url)\n", + " docs.extend(loader.load())\n", + " if (i + 1) % 10 == 0:\n", + " print(f\"Loaded {i + 1}/{len(urls)} URLs\")\n", + " except Exception as e:\n", + " print(f\"Failed to load {url}: {e}\")\n", + "\n", + "print(f\"\\nTotal documents loaded: {len(docs)}\")\n", + "'''\n", + "# --- Method 2: Load documents locally from the system (commented out) ---\n", + "\n", + "import os\n", + "from langchain.document_loaders import (\n", + " TextLoader,\n", + " PyPDFLoader,\n", + " DirectoryLoader,\n", + ")\n", + "from langchain.schema import Document as LCDocument\n", + "from typing import List\n", + "\n", + "class LocalDocumentLoader:\n", + " \"\"\"Load documents from a local directory using LangChain loaders.\"\"\"\n", + " def __init__(self, directory_path: str):\n", + " self.directory_path = directory_path\n", + "\n", + " def load(self) -> List[LCDocument]:\n", + " \"\"\"Loads all supported documents from the directory.\"\"\"\n", + " if not self.directory_path:\n", + " raise ValueError(\"Directory path not set.\")\n", + "\n", + " # Define loaders for different file types\n", + " txt_loader = DirectoryLoader(\n", + " self.directory_path, glob=\"**/*.txt\", loader_cls=TextLoader,\n", + " loader_kwargs={\"encoding\": \"utf-8\"}, show_progress=True\n", + " )\n", + " pdf_loader = DirectoryLoader(\n", + " self.directory_path, glob=\"**/*.pdf\", loader_cls=PyPDFLoader, show_progress=True\n", + " )\n", + "\n", + " documents = []\n", + " documents.extend(txt_loader.load())\n", + " documents.extend(pdf_loader.load())\n", + " \n", + " return documents\n", + "\n", + "#Usage Example:\n", + "loader = LocalDocumentLoader(directory_path=\"content\")\n", + "docs = loader.load()\n", + "print(f\"Loaded {len(docs)} local documents.\")" + ] + }, + { + "cell_type": "markdown", + "id": "a6b107e7", + "metadata": {}, + "source": [ + "### Split Documents into Chunks\n", + "\n", + "LLMs have a limited context window, so we need to split large documents into smaller chunks. This ensures that the model can process the retrieved information effectively. We use `RecursiveCharacterTextSplitter` which is a smart way to split text while trying to keep related content together." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "51d07ec4-b929-4893-baff-af68a4fbf3aa", + "metadata": {}, + "outputs": [], + "source": [ + "from langchain.text_splitter import RecursiveCharacterTextSplitter\n", + "\n", + "# Split the documents into smaller chunks with a specified size and overlap\n", + "text_splitter = RecursiveCharacterTextSplitter(\n", + " chunk_size=1250,\n", + " chunk_overlap=100,\n", + " length_function=len,\n", + " is_separator_regex=False\n", + ")\n", + "\n", + "split_docs = text_splitter.split_documents(docs)\n", + "print(f\"Documents split into {len(split_docs)} chunks.\")" + ] + }, + { + "cell_type": "markdown", + "id": "1a734d8c", + "metadata": {}, + "source": [ + "## 5. Vector Store and Retriever Setup\n", + "\n", + "Now we'll create a vector store to house the document embeddings and enable efficient similarity searches.\n", + "\n", + "- **`Chroma`**: We use ChromaDB as our vector store. It's a lightweight and easy-to-use vector database.\n", + "- **`persist_directory`**: This saves the created database to disk, allowing us to reuse it later without re-processing the documents." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "194acf26-f710-483d-97d6-57bfff7cfa65", + "metadata": {}, + "outputs": [], + "source": [ + "# Create a ChromaDB instance to store the document embeddings\n", + "from langchain_community.vectorstores import Chroma\n", + "\n", + "vectorstore = Chroma(\n", + " embedding_function=embedding,\n", + " persist_directory=\"./chromadb_varsity\",\n", + " collection_name=\"zerodha_varsity_docs\"\n", + ")" + ] + }, + { + "cell_type": "markdown", + "id": "0690e134", + "metadata": {}, + "source": [ + "### Add Documents to the Vector Store\n", + "\n", + "We add the processed document chunks to the vector store. To handle a large number of documents efficiently, we add them in batches. The metadata is also filtered to ensure compatibility with the vector store." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "08f6ce04-9702-4372-be96-6fc34431fc21", + "metadata": {}, + "outputs": [], + "source": [ + "from langchain_community.vectorstores.utils import filter_complex_metadata\n", + "\n", + "# Function to insert embeddings in batches for a lengthy document set\n", + "def add_documents_in_batches(vectorstore, docs, batch_size=100):\n", + " \"\"\"Adds documents to the vectorstore in batches.\"\"\"\n", + " for i in range(0, len(docs), batch_size):\n", + " chunk = docs[i : i + batch_size]\n", + " vectorstore.add_documents(chunk)\n", + " print(f\"Added batch {i//batch_size + 1}/{(len(docs)-1)//batch_size + 1}\")\n", + " # Persist the database to disk if the method is available\n", + " if hasattr(vectorstore, \"persist\"):\n", + " vectorstore.persist()\n", + "\n", + "# Filter out complex metadata that might cause issues\n", + "filtered_docs = filter_complex_metadata(split_docs)\n", + "\n", + "# Add the documents to the vector store in batches\n", + "add_documents_in_batches(vectorstore, filtered_docs)" + ] + }, + { + "cell_type": "markdown", + "id": "4af0346d", + "metadata": {}, + "source": [ + "### Set up a Reranking Retriever\n", + "\n", + "To improve the quality of retrieved documents, we use a reranker. The initial retriever fetches a set of documents (e.g., k=5), and the reranker (`FlashrankRerank`) re-orders them based on their relevance to the query. This ensures that the most relevant context is passed to the LLM.\n", + "\n", + "- **`ContextualCompressionRetriever`**: Wraps a base retriever and a document compressor (the reranker) to create this two-stage retrieval process." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "01bc1634-dcea-431c-b447-af5b7d38aaeb", + "metadata": {}, + "outputs": [], + "source": [ + "from langchain.retrievers import ContextualCompressionRetriever\n", + "from langchain.retrievers.document_compressors import FlashrankRerank\n", + "\n", + "# Set up the base retriever to fetch the top 5 documents\n", + "retriever = vectorstore.as_retriever(search_kwargs={\"k\": 5})\n", + "\n", + "# Initialize the reranker\n", + "compressor = FlashrankRerank()\n", + "\n", + "# Create the compression retriever, which combines retrieval and reranking\n", + "compression_retriever = ContextualCompressionRetriever(\n", + " base_compressor=compressor,\n", + " base_retriever=retriever\n", + ")" + ] + }, + { + "cell_type": "markdown", + "id": "2c9bd8c5", + "metadata": {}, + "source": [ + "## 6. Building the RAG Chain\n", + "\n", + "With all the components ready, we now assemble the final RAG pipeline using LangChain's `RetrievalQA` chain. This chain connects the LLM with the retriever.\n", + "\n", + "- **`chain_type=\"stuff\"`**: This means all retrieved documents will be \"stuffed\" into the prompt sent to the LLM.\n", + "- **`return_source_documents=True`**: This is important for evaluation, as it allows us to see which documents were used to generate the answer." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "e18d5187-a6c6-406f-b5b2-f9982d97d3a2", + "metadata": {}, + "outputs": [], + "source": [ + "from langchain.chains import RetrievalQA\n", + "\n", + "qa_chain = RetrievalQA.from_chain_type(\n", + " llm=llm,\n", + " chain_type=\"stuff\",\n", + " retriever=compression_retriever,\n", + " return_source_documents=True\n", + ")" + ] + }, + { + "cell_type": "markdown", + "id": "db9453a2", + "metadata": {}, + "source": [ + "## 7. Running the RAG Pipeline\n", + "\n", + "It's time to ask a question! The `qa_chain.invoke` method will execute the full RAG process: retrieve relevant documents, pass them to the LLM along with the question, and return the final answer." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "fc8def11-554b-4bd1-ab37-9824f003966e", + "metadata": {}, + "outputs": [], + "source": [ + "question = \"What is deep link?\"\n", + "result = qa_chain.invoke({\"query\": question})\n", + "print(\"--- Question ---\")\n", + "print(question)\n", + "print(\"\\n--- Answer ---\")\n", + "print(result[\"result\"])" + ] + }, + { + "cell_type": "markdown", + "id": "b30815a2", + "metadata": {}, + "source": [ + "### Extract Answer and Context for Evaluation\n", + "\n", + "For the evaluation step, we need to isolate the generated answer and the source documents (the context or \"reference\")." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "991a94fc-b7b3-4709-896b-c613e1b857b8", + "metadata": {}, + "outputs": [], + "source": [ + "answer = result['result']\n", + "context = \" \".join([d.page_content for d in result['source_documents']])" + ] + }, + { + "cell_type": "markdown", + "id": "bf65db80", + "metadata": {}, + "source": [ + "## 8. Evaluation\n", + "\n", + "To assess the quality of our RAG pipeline, we use a custom `OpenVINORAGEvaluator` class. This class uses OpenVINO-optimized models to calculate several key metrics:\n", + "\n", + "- **BLEU & ROUGE**: Measure the overlap between the generated answer and the reference context.\n", + "- **BERTScore**: Computes semantic similarity, which is more advanced than simple overlap.\n", + "- **Perplexity**: Measures how well a language model (here, Llama-2-7B) predicts the generated text. Lower is better.\n", + "- **Diversity**: Calculates the variety of tokens in the response.\n", + "- **Racial Bias**: Uses a hate speech detection model to check for biased content.\n", + "\n", + "**Note**: The first time you run this, it will download and convert the necessary evaluation models (Llama-2-7B and a hate speech model) to the OpenVINO format. This is a one-time setup." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "1a8122e4-6602-4750-ad6e-c5cc599e0b0a", + "metadata": {}, + "outputs": [], + "source": [ + "import openvino as ov\n", + "import numpy as np\n", + "import torch\n", + "from transformers import AutoTokenizer, AutoModelForCausalLM, AutoModelForSequenceClassification\n", + "from optimum.intel import OVModelForCausalLM, OVModelForSequenceClassification\n", + "from nltk.translate.bleu_score import corpus_bleu, SmoothingFunction\n", + "from rouge_score import rouge_scorer\n", + "from bert_score import score\n", + "from nltk.util import ngrams\n", + "from typing import List\n", + "import os\n", + "\n", + "class OpenVINORAGEvaluator:\n", + " \"\"\"An evaluator for RAG pipelines using OpenVINO-optimized models.\"\"\"\n", + " \n", + " def __init__(self, device=\"GPU\", models_dir=\"./openvino_models\"):\n", + " self.device = device\n", + " self.models_dir = models_dir\n", + " os.makedirs(self.models_dir, exist_ok=True)\n", + " \n", + " # Initialize models and tokenizers for evaluation\n", + " self.llama2_model, self.llama2_tokenizer = self._load_model(\n", + " model_id=\"meta-llama/Llama-2-7b-hf\",\n", + " ov_model_class=OVModelForCausalLM,\n", + " subfolder=\"llama2-7b-openvino\"\n", + " )\n", + " self.bias_model, self.bias_tokenizer = self._load_model(\n", + " model_id=\"Hate-speech-CNERG/dehatebert-mono-english\",\n", + " ov_model_class=OVModelForSequenceClassification,\n", + " subfolder=\"hate-speech-openvino\"\n", + " )\n", + " print(f\"OpenVINO RAG Evaluator initialized on {device}\")\n", + "\n", + " def _load_model(self, model_id, ov_model_class, subfolder):\n", + " \"\"\"Generic function to load or convert a model to OpenVINO format.\"\"\"\n", + " model_path = os.path.join(self.models_dir, subfolder)\n", + " \n", + " if not os.path.exists(os.path.join(model_path, \"openvino_model.xml\")):\n", + " print(f\"Converting {model_id} to OpenVINO format...\")\n", + " ov_model = ov_model_class.from_pretrained(model_id, export=True, compile=False)\n", + " ov_model.save_pretrained(model_path)\n", + " print(f\"Model saved to {model_path}\")\n", + " \n", + " try:\n", + " print(f\"Loading {model_id} from {model_path}...\")\n", + " model = ov_model_class.from_pretrained(model_path, device=self.device)\n", + " tokenizer = AutoTokenizer.from_pretrained(model_id)\n", + " print(f\"{model_id} loaded successfully.\")\n", + " return model, tokenizer\n", + " except Exception as e:\n", + " print(f\"Error loading {model_id}: {e}\")\n", + " return None, None\n", + "\n", + " def evaluate_bleu_rouge(self, candidates: List[str], references: List[str]):\n", + " \"\"\"Calculates BLEU and ROUGE scores.\"\"\"\n", + " candidate_tokens = [c.split() for c in candidates]\n", + " reference_tokens = [[r.split()] for r in references]\n", + " \n", + " # BLEU with smoothing\n", + " smoothing = SmoothingFunction().method1\n", + " bleu_score = corpus_bleu(reference_tokens, candidate_tokens, smoothing_function=smoothing)\n", + " \n", + " # ROUGE\n", + " scorer = rouge_scorer.RougeScorer(['rouge1', 'rougeL'], use_stemmer=True)\n", + " rouge1_f1 = sum(scorer.score(ref, cand)['rouge1'].fmeasure for ref, cand in zip(references, candidates)) / len(candidates)\n", + " return bleu_score, rouge1_f1\n", + "\n", + " def evaluate_bert_score(self, candidates: List[str], references: List[str]):\n", + " \"\"\"Calculates BERTScore.\"\"\"\n", + " _, _, f1 = score(candidates, references, lang=\"en\", model_type='bert-base-multilingual-cased')\n", + " return f1.mean().item()\n", + "\n", + " def evaluate_perplexity(self, text: str):\n", + " \"\"\"Calculates perplexity using the loaded Llama-2 model.\"\"\"\n", + " if not self.llama2_model:\n", + " return float('inf')\n", + " \n", + " try:\n", + " encodings = self.llama2_tokenizer(text, return_tensors='pt', max_length=1024, truncation=True)\n", + " input_ids = encodings.input_ids\n", + " \n", + " with torch.no_grad():\n", + " outputs = self.llama2_model(input_ids)\n", + " logits = outputs.logits\n", + " \n", + " # Manually calculate cross-entropy loss\n", + " # Shift logits and labels for next-token prediction\n", + " shift_logits = logits[..., :-1, :].contiguous()\n", + " shift_labels = input_ids[..., 1:].contiguous()\n", + " \n", + " # Calculate loss\n", + " loss_fct = torch.nn.CrossEntropyLoss()\n", + " loss = loss_fct(shift_logits.view(-1, shift_logits.size(-1)), shift_labels.view(-1))\n", + " perplexity = torch.exp(loss)\n", + " \n", + " return perplexity.item()\n", + " except Exception as e:\n", + " print(f\"Error calculating perplexity: {e}\")\n", + " return float('inf')\n", + "\n", + " def evaluate_racial_bias(self, text: str):\n", + " \"\"\"Evaluates racial bias using a hate speech detection model.\"\"\"\n", + " if not self.bias_model:\n", + " return 0.0\n", + "\n", + " try:\n", + " inputs = self.bias_tokenizer(text, return_tensors=\"pt\", truncation=True, max_length=512)\n", + " with torch.no_grad():\n", + " logits = self.bias_model(**inputs).logits\n", + " probabilities = torch.nn.functional.softmax(logits, dim=-1)\n", + " # Return the probability of the 'hate speech' class (index 1)\n", + " bias_score = probabilities[0][1].item()\n", + " return bias_score\n", + " except Exception as e:\n", + " print(f\"Error calculating bias: {e}\")\n", + " return 0.0\n", + " \n", + " def evaluate_all(self, response: str, reference: str):\n", + " \"\"\"Runs a comprehensive evaluation and returns all metrics.\"\"\"\n", + " candidates = [response]\n", + " references = [reference]\n", + " \n", + " try:\n", + " bleu, rouge1 = self.evaluate_bleu_rouge(candidates, references)\n", + " bert_f1 = self.evaluate_bert_score(candidates, references)\n", + " perplexity = self.evaluate_perplexity(response)\n", + " racial_bias = self.evaluate_racial_bias(response)\n", + " \n", + " return {\n", + " \"BLEU\": bleu,\n", + " \"ROUGE-1\": rouge1,\n", + " \"BERT F1\": bert_f1,\n", + " \"Perplexity\": perplexity,\n", + " \"Racial Bias\": racial_bias\n", + " }\n", + " except Exception as e:\n", + " print(f\"An error occurred during evaluation: {e}\")\n", + " return {k: 0.0 for k in [\"BLEU\", \"ROUGE-1\", \"BERT F1\", \"Perplexity\", \"Racial Bias\"]}" + ] + }, + { + "cell_type": "markdown", + "id": "e9c19402", + "metadata": {}, + "source": [ + "### Run the Evaluation\n", + "\n", + "Finally, we initialize the `OpenVINORAGEvaluator` and call `evaluate_all` to get a dictionary of scores. This provides a quantitative look at the performance of our RAG pipeline for the given query." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "3dedb93b-20e3-43c3-93a1-38cb3e114019", + "metadata": {}, + "outputs": [], + "source": [ + "# Initialize the evaluator (this might take a moment on the first run)\n", + "evaluator = OpenVINORAGEvaluator(device=\"GPU\")\n", + "\n", + "# Prepare the data for evaluation\n", + "response_text = answer\n", + "reference_text = context\n", + "\n", + "# Get all evaluation metrics\n", + "metrics = evaluator.evaluate_all(response_text, reference_text)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "8006c91c-1180-41c8-b04e-448e4131391f", + "metadata": {}, + "outputs": [], + "source": [ + "print(\"--- Evaluation Metrics ---\")\n", + "for metric, value in metrics.items():\n", + " print(f\"{metric}: {value:.4f}\")" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "f53cbb39-9967-4e4e-8e1d-588bf2aee390", + "metadata": {}, + "outputs": [], + "source": [] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3 (ipykernel)", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.10.11" + } + }, + "nbformat": 4, + "nbformat_minor": 5 +} diff --git a/notebooks/llm-rag-ov-langchain/requirements.txt b/notebooks/llm-rag-ov-langchain/requirements.txt new file mode 100644 index 00000000000..ae31a56b1aa --- /dev/null +++ b/notebooks/llm-rag-ov-langchain/requirements.txt @@ -0,0 +1,24 @@ +sacrebleu +rouge-score +bert-score +transformers +typing +nltk +numpy +textblob +dataset +langchain +langchain_community +chromadb +langchain-chroma +langchain-huggingface +sentence-transformers +Flashrank +langchain_community +msoffcrypto-tool +docx2txt +urllib +bs4 +os +optimum[intel] +python-docx