From 62802f83a25f5d8cc51a1ed29b1507bbac0eae49 Mon Sep 17 00:00:00 2001 From: "Khara, Parshwa" Date: Wed, 12 Nov 2025 12:23:12 +0530 Subject: [PATCH 1/5] =?UTF-8?q?A=20proof-of-concept=20toolkit=20to=20help?= =?UTF-8?q?=20OEMs=20evaluate=20RAG=20pipeline=20quality.=20The=20toolkit?= =?UTF-8?q?=20computes=20standard=20metrics=20(BERT,=20BLEU,=20ROUGE,=20pe?= =?UTF-8?q?rplexity=20score)=20and=20a=20racial-bias=20indicator,=20and=20?= =?UTF-8?q?it=20is=20implemented=20using=20Optimum-Intel=E2=80=99s=20OVMod?= =?UTF-8?q?elForCausalLM=20with=20the=20OpenVINO=20backend=20and=20LangCha?= =?UTF-8?q?in=20for=20orchestration.?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit --- notebooks/llm-rag-ov-langchain/README.md | 197 ++++++ .../ov_rag_evaluator.ipynb | 662 ++++++++++++++++++ .../llm-rag-ov-langchain/requirements.txt | 24 + 3 files changed, 883 insertions(+) create mode 100644 notebooks/llm-rag-ov-langchain/README.md create mode 100644 notebooks/llm-rag-ov-langchain/ov_rag_evaluator.ipynb create mode 100644 notebooks/llm-rag-ov-langchain/requirements.txt diff --git a/notebooks/llm-rag-ov-langchain/README.md b/notebooks/llm-rag-ov-langchain/README.md new file mode 100644 index 00000000000..5888563fb79 --- /dev/null +++ b/notebooks/llm-rag-ov-langchain/README.md @@ -0,0 +1,197 @@ +# RAG Performance & Fairness Evaluation Toolkit (OpenVINO + LangChain) + +This toolkit enables developers to build, evaluate, and optimize Retrieval-Augmented Generation (RAG) applications with comprehensive quality metrics including accuracy, bias detection, and perplexity analysis plus a racial-bias indicator. This uses RAG pipeline optimized with Intel OpenVINO for enhanced performance on CPU, GPU, and NPU. The pipeline leverages: +- Optimum-Intel’s `OVModelForCausalLM` with the OpenVINO backend for efficient inference. +- LangChain for orchestration of document loading, chunking, embedding, retrieval, reranking, and generation. + +> Goal: Provide a portable notebook-driven workflow for rapid experimentation, model comparison, and validation of RAG systems on custom/private corpora. + +--- + +## 1. What Is RAG? + +Retrieval-Augmented Generation combines: +1. Retrieval: Selecting the most relevant context snippets from a document store. +2. Generation: Supplying those snippets to an LLM to produce grounded answers. + +Benefits: +- Injects up-to-date and domain-specific knowledge without fine-tuning the LLM. +- Reduces hallucinations by constraining generation to retrieved evidence. +- Supports compliance and audit by exposing sources (metadata) for each answer. + +--- + +## 2. RAG Performance & Fairness Evaluation Toolkit Overview + +| Component | Role | +|--------------------------|------| +| Document Loaders | Ingest local files (.pdf, .txt, .docx, .json, .csv) or URLs/web pages. | +| Text Splitter | Chunk documents into semantically sized pieces for embedding. | +| Embedding Model | Converts chunks to vector representations for similarity search. | +| Vector Store / Index | Persists embeddings enabling fast approximate or exact nearest-neighbor retrieval. | +| (Optional) Reranker | Re-orders retrieved candidates for improved answer grounding. | +| Generator (OVModel) | Runs local accelerated LLM inference via OpenVINO. | +| Evaluator | Computes quality and bias metrics. | +| Notebook Orchestrator | Step-by-step cells show the entire flow and allow interactive parameter tuning. | + +--- + +## 3. Key Features + +- **OpenVINO Model Optimization**: + - Hardware-accelerated inference using OpenVINO for LLMs and embedding models +- **Flexible Model Support**: + - LLM: Microsoft Phi-3-mini-4k-instruct (easily swappable with other HuggingFace models) + - Embeddings: BGE-small-en-v1.5 (supports other embedding models) + - Evaluation: Llama-2-7B for perplexity scoring +- **Advanced Retrieval**: + - ChromaDB vector store with persistent storage + - FlashRank reranking for improved retrieval accuracy + - Batch embedding insertion for large document sets +- **Multiple Document Sources**: + - Web scraping from sitemaps and URLs + - Local file loading (.pdf, .txt, .docx, .csv, .json, .xlsx) + - Supports both single and bulk document processing +- **Comprehensive Evaluation Metrics**: + - BLEU Score: Translation quality metric + - ROUGE Score: Summary quality assessment + - BERT Score: Semantic similarity using BERT embeddings + - Perplexity: Language model confidence measurement + - Diversity Score: Response variety analysis + - Racial Bias Detection: Using hate-speech detection model + +--- + +## 4. Installation + +```bash +# Clone the repository +cd RAG-OV-Langchain +pip install -r requirements.txt +``` + +(If OpenVINO runtime prerequisites are not already satisfied, follow Intel’s OpenVINO setup instructions.) + +--- + +## 5. Running the Notebook + +1. Launch Jupyter: `jupyter notebook` +2. Open the provided notebook - `ov_rag_evaluator.ipynb` +3. Execute cells in order; each cell includes explanatory comments. +4. Provide input sources (file paths or URLs) when prompted. +5. Adjust parameters such as: + - Chunk size / overlap + - Embedding model name + - Retrieval top-k + - Reranker toggle + - Generation temperature / max tokens +6. Run evaluation cells to view metrics dashboard output. + +--- + +## 6. Input / Output Formats + +### Supported Input +- Textual documents: `.pdf`, `.txt`, `.docx`, `.json`, `.csv` +- Web content: Page URLs (scraped & cleaned) +- (Extendable) Additional loaders can be registered for other data types. + +### Output +- Generated answer grounded in retrieved context. +- List of source chunks with: + - Document identifier + - Chunk index + - Similarity / relevance score + - Optional rerank score +- Metrics report (per query or aggregate). + +--- + +## 7. Evaluation Metrics + +| Metric | Purpose | +|---------------|---------| +| BERTScore | Semantic similarity vs. reference answer(s). | +| BLEU | n-gram precision (machine translation heritage; still indicative for overlap). | +| ROUGE | Recall-oriented overlap (useful for summarization-style references). | +| Perplexity | Fluency measure of generated text under a language model. | +| Racial Bias Indicator | Heuristic or embedding-based measure identifying disproportionate associations or skewed outputs. | + +Notes: +- Provide one or more reference answers (gold annotations) for BLEU/ROUGE/BERTScore. +- Perplexity may rely on a reference language model distinct from the generator. +- Bias indicator may leverage word association tests or sentiment differentials; interpret conservatively. + +--- + +## 8. Racial Bias Indicator (Concept) + +The notebook computes a racial bias signal that can highlight when generated answers: +- Over-index on certain demographic terms. +- Exhibit asymmetric sentiment or descriptors. +- Associate professions or attributes disproportionately. + +Recommended usage: +- Treat as a screening heuristic. +- Follow up with manual review. +- Do not treat a single numeric score as definitive. + +--- + +## 9. Customization + +You can modify: +- Embedding backend (e.g., `sentence-transformers`, `text-embedding-*` models). +- Retrieval strategy (FAISS, chroma, or other vector stores). +- Reranking (e.g., cross-encoder or LLM-based rerank). +- Generation model (swap Hugging Face model; ensure OpenVINO export or optimization). +- Metric thresholds for acceptance gating. + +--- + +## 10. Suggested Workflow + +1. Curate domain corpus. +2. Run baseline RAG with default parameters. +3. Collect queries & gold references (if available). +4. Evaluate metrics; record baseline. +5. Iterate: + - Tune chunking, top-k. + - Introduce reranker. + - Switch embedding model. + - Optimize LLM (quantization, OpenVINO optimizations). +6. Compare metric deltas; choose best configuration for deployment. + +--- + +## 11. Performance Considerations + +- OpenVINO accelerates inference on Intel hardware (CPU / GPU / NPU where supported). +- Smaller embedding models may trade slight recall for speed. +- Reranking adds latency; enable only if precision gains matter. +- Batch queries in evaluation phase to amortize setup costs. + +--- + +## 12. Limitations + +- Metrics may not fully capture factual grounding; consider human review. +- Bias indicator is heuristic; deeper audits require specialized tools. +- Long documents may need advanced chunking strategies (semantic splitting). +- URL ingestion quality depends on HTML cleanliness. + +--- + +## FAQs + +Q: Can I use a different LLM? +A: Yes, replace the checkpoint and ensure OpenVINO optimization/export steps are applied. + +Q: Do I need gold answers? +A: For BLEU/ROUGE/BERTScore, yes. For exploratory retrieval quality, you can still inspect sources without them. + +Q: How to reduce hallucinations? +A: Increase retrieval relevance (tune embeddings, use reranking) and constrain generation parameters (lower temperature). + +--- diff --git a/notebooks/llm-rag-ov-langchain/ov_rag_evaluator.ipynb b/notebooks/llm-rag-ov-langchain/ov_rag_evaluator.ipynb new file mode 100644 index 00000000000..12299f9a75f --- /dev/null +++ b/notebooks/llm-rag-ov-langchain/ov_rag_evaluator.ipynb @@ -0,0 +1,662 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# RAG Performance & Fairness Evaluation Toolkit (OpenVINO + LangChain)\n", + "\n", + "This notebook demonstrates how to build and evaluate a Retrieval-Augmented Generation (RAG) pipeline using OpenVINO™ for accelerated performance on Intel hardware. We will use Hugging Face and LangChain libraries to construct the pipeline.\n", + "\n", + "The process involves:\n", + "1. **Environment Setup**: Installing necessary libraries.\n", + "2. **LLM and Tokenizer Setup**: Loading a language model (Microsoft's Phi-3-mini) and its tokenizer, optimized with OpenVINO.\n", + "3. **Embedding Model Setup**: Preparing an embedding model to convert text into vector representations.\n", + "4. **Data Loading and Processing**: Fetching documents from a web source, splitting them into manageable chunks, and creating vector embeddings.\n", + "5. **Vector Store and Retriever Setup**: Storing the embeddings in a ChromaDB vector store and setting up a retriever with reranking for improved accuracy.\n", + "6. **Building the RAG Chain**: Creating a `RetrievalQA` chain that combines the retriever and the LLM.\n", + "7. **Running the RAG Pipeline**: Asking a question to get a response from the RAG system.\n", + "8. **Evaluation**: Using a comprehensive `OpenVINORAGEvaluator` to assess the quality of the generated response based on various metrics like BLEU, ROUGE, BERTScore, perplexity, and bias." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 1. Environment Setup\n", + "\n", + "First, let's ensure all the required Python packages are installed. The following commands handle the installation of essential libraries. These are typically only needed if you encounter version conflicts or issues with existing installations." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "c4a2dc6a-3d3e-4da2-902f-30f3cbd24b39", + "metadata": {}, + "outputs": [], + "source": [ + "# Execute only if needed (in case of any errors due to setuptools or whl installation)\n", + "!pip install setuptools==57.0.0 --force-reinstall\n", + "!pip install wheel==0.36.2 --force-reinstall\n", + "!pip uninstall comtypes -y\n", + "!pip install --no-cache-dir comtypes" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 2. LLM and Tokenizer Setup\n", + "\n", + "Next, we load the Large Language Model (LLM) and its corresponding tokenizer. We use `optimum-intel` to convert and accelerate the model with OpenVINO. In this example, we use `microsoft/Phi-3-mini-4k-instruct`, but you can replace it with another compatible model.\n", + "\n", + "- **`OVModelForCausalLM`**: Loads a causal language model and automatically converts it to the OpenVINO format (`export=True`).\n", + "- **`device=\"GPU\"`**: Specifies that the model should run on the integrated GPU for acceleration. You can change this to `\"CPU\"`." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "90e68d95-9a4e-4ba5-9040-4422c1333444", + "metadata": {}, + "outputs": [], + "source": [ + "from optimum.intel import OVModelForCausalLM\n", + "from transformers import AutoTokenizer, pipeline\n", + "from langchain_huggingface import HuggingFacePipeline\n", + "\n", + "# Load model with OpenVINO backend\n", + "model = OVModelForCausalLM.from_pretrained(\n", + " \"microsoft/Phi-3-mini-4k-instruct\", # You can plug in any other supported model\n", + " export=True, # Convert to OpenVINO format on the fly\n", + " device=\"GPU\" # Specify GPU for inference, can also be \"CPU\"\n", + ")\n", + "\n", + "tokenizer = AutoTokenizer.from_pretrained(\"microsoft/Phi-3-mini-4k-instruct\")" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Create a LangChain-compatible LLM Pipeline\n", + "\n", + "We now create a `text-generation` pipeline using the OpenVINO-optimized model and tokenizer. This pipeline is then wrapped in `HuggingFacePipeline` to make it compatible with the LangChain ecosystem. A quick test is run to confirm the pipeline is working correctly." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "f985ca28-9e3d-490d-954c-71b24fc47eda", + "metadata": {}, + "outputs": [], + "source": [ + "# Create a text-generation pipeline with the OpenVINO model\n", + "llm_pipeline = pipeline(\n", + " \"text-generation\",\n", + " model=model,\n", + " tokenizer=tokenizer,\n", + " device=model.device,\n", + " max_new_tokens=100,\n", + " top_k=50,\n", + " temperature=0.1,\n", + " do_sample=True\n", + ")\n", + "\n", + "# Create a LangChain instance from the Hugging Face pipeline\n", + "llm = HuggingFacePipeline(pipeline=llm_pipeline)\n", + "\n", + "# Test the pipeline with a sample query\n", + "response = llm.invoke(\"What is an ocean?\")\n", + "print(response)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 3. Embedding Model Setup\n", + "\n", + "For the retrieval part of our RAG pipeline, we need an embedding model to convert text documents into numerical vectors. We use `OpenVINOBgeEmbeddings` from `langchain_community`, which provides OpenVINO-optimized embeddings for efficient performance. Here, we use the `bge-small-en-v1.5` model." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "606ff70a-f797-42a5-a697-8cb5c13c0dae", + "metadata": {}, + "outputs": [], + "source": [ + "from langchain_community.embeddings import OpenVINOBgeEmbeddings\n", + "\n", + "embedding_model_name = \"bge-small-en-v1.5\" # You can plug in any other embedding model\n", + "embedding_model_kwargs = {\"device\": \"CPU\"} # Embeddings often run well on CPU\n", + "encode_kwargs = {\n", + " \"normalize_embeddings\": True,\n", + "}\n", + "\n", + "# Initialize OpenVINO-optimized embeddings\n", + "embedding = OpenVINOBgeEmbeddings(\n", + " model_name=embedding_model_name,\n", + " model_kwargs=embedding_model_kwargs,\n", + " encode_kwargs=encode_kwargs,\n", + ")\n", + "\n", + "# Sample text to verify embedding functionality\n", + "text = \"This is a test document.\"\n", + "embedding_result = embedding.embed_query(text)\n", + "print(\"Sample embedding (first 3 dimensions):\", embedding_result[:3])" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 4. Data Loading and Processing\n", + "\n", + "Now we'll load the documents that will form the knowledge base for our RAG pipeline. This notebook includes two methods for loading documents:\n", + "\n", + "1. **Web Crawling (Enabled by default)**: Fetches content from a website's sitemap. We use `WebBaseLoader` to load content from URLs found in the sitemap of Zerodha Varsity.\n", + "2. **Local File Loading (Commented out)**: A robust `LangChainDocumentLoader` class is provided to load various file types (`.txt`, `.pdf`, `.docx`, etc.) from a local directory. You can uncomment and adapt this section if you want to use your own local files." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "808e9c2d-ab4a-4bc6-bb45-f3b2d4be3156", + "metadata": {}, + "outputs": [], + "source": [ + "import bs4\n", + "from urllib.request import Request, urlopen\n", + "from bs4 import BeautifulSoup\n", + "import ssl\n", + "from langchain_community.document_loaders import WebBaseLoader\n", + "\n", + "# --- Method 1: Load documents by crawling a web page (default) ---\n", + "def get_sitemap(url):\n", + " \"\"\"Fetches and parses an XML sitemap from a URL.\"\"\"\n", + " req = Request(url, headers={\"User-Agent\": \"Mozilla/5.0\"})\n", + " response = urlopen(req)\n", + " xml = BeautifulSoup(response, \"lxml-xml\", from_encoding=response.info().get_param(\"charset\"))\n", + " return xml\n", + "\n", + "def get_urls_from_sitemap(xml):\n", + " \"\"\"Extracts all URLs from a parsed sitemap XML.\"\"\"\n", + " urls = [loc.text for loc in xml.find_all(\"loc\")]\n", + " return urls\n", + "\n", + "# Bypass SSL verification issues if they arise\n", + "ssl._create_default_https_context = ssl._create_stdlib_context\n", + "\n", + "sitemap_url = \"https://zerodha.com/varsity/chapter-sitemap2.xml\"\n", + "sitemap_xml = get_sitemap(sitemap_url)\n", + "urls = get_urls_from_sitemap(sitemap_xml)\n", + "\n", + "# Load documents from the collected URLs\n", + "docs = []\n", + "for i, url in enumerate(urls):\n", + " try:\n", + " loader = WebBaseLoader(url)\n", + " docs.extend(loader.load())\n", + " if (i + 1) % 10 == 0:\n", + " print(f\"Loaded {i + 1}/{len(urls)} URLs\")\n", + " except Exception as e:\n", + " print(f\"Failed to load {url}: {e}\")\n", + "\n", + "print(f\"\\nTotal documents loaded: {len(docs)}\")\n", + "\n", + "# --- Method 2: Load documents locally from the system (commented out) ---\n", + "'''\n", + "import os\n", + "from langchain.document_loaders import (\n", + " TextLoader,\n", + " PyPDFLoader,\n", + " DirectoryLoader,\n", + ")\n", + "from langchain.schema import Document as LCDocument\n", + "from typing import List\n", + "\n", + "class LocalDocumentLoader:\n", + " \"\"\"Load documents from a local directory using LangChain loaders.\"\"\"\n", + " def __init__(self, directory_path: str):\n", + " self.directory_path = directory_path\n", + "\n", + " def load(self) -> List[LCDocument]:\n", + " \"\"\"Loads all supported documents from the directory.\"\"\"\n", + " if not self.directory_path:\n", + " raise ValueError(\"Directory path not set.\")\n", + "\n", + " # Define loaders for different file types\n", + " txt_loader = DirectoryLoader(\n", + " self.directory_path, glob=\"**/*.txt\", loader_cls=TextLoader,\n", + " loader_kwargs={\"encoding\": \"utf-8\"}, show_progress=True\n", + " )\n", + " pdf_loader = DirectoryLoader(\n", + " self.directory_path, glob=\"**/*.pdf\", loader_cls=PyPDFLoader, show_progress=True\n", + " )\n", + "\n", + " documents = []\n", + " documents.extend(txt_loader.load())\n", + " documents.extend(pdf_loader.load())\n", + " \n", + " return documents\n", + "\n", + "# Usage Example:\n", + "# loader = LocalDocumentLoader(directory_path=\"/path/to/your/documents\")\n", + "# docs = loader.load()\n", + "# print(f\"Loaded {len(docs)} local documents.\")\n", + "'''" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Split Documents into Chunks\n", + "\n", + "LLMs have a limited context window, so we need to split large documents into smaller chunks. This ensures that the model can process the retrieved information effectively. We use `RecursiveCharacterTextSplitter` which is a smart way to split text while trying to keep related content together." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "51d07ec4-b929-4893-baff-af68a4fbf3aa", + "metadata": {}, + "outputs": [], + "source": [ + "from langchain.text_splitter import RecursiveCharacterTextSplitter\n", + "\n", + "# Split the documents into smaller chunks with a specified size and overlap\n", + "text_splitter = RecursiveCharacterTextSplitter(\n", + " chunk_size=1250,\n", + " chunk_overlap=100,\n", + " length_function=len,\n", + " is_separator_regex=False\n", + ")\n", + "\n", + "split_docs = text_splitter.split_documents(docs)\n", + "print(f\"Documents split into {len(split_docs)} chunks.\")" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 5. Vector Store and Retriever Setup\n", + "\n", + "Now we'll create a vector store to house the document embeddings and enable efficient similarity searches.\n", + "\n", + "- **`Chroma`**: We use ChromaDB as our vector store. It's a lightweight and easy-to-use vector database.\n", + "- **`persist_directory`**: This saves the created database to disk, allowing us to reuse it later without re-processing the documents." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "194acf26-f710-483d-97d6-57bfff7cfa65", + "metadata": {}, + "outputs": [], + "source": [ + "# Create a ChromaDB instance to store the document embeddings\n", + "from langchain_community.vectorstores import Chroma\n", + "\n", + "vectorstore = Chroma(\n", + " embedding_function=embedding,\n", + " persist_directory=\"./chromadb_varsity\",\n", + " collection_name=\"zerodha_varsity_docs\"\n", + ")" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Add Documents to the Vector Store\n", + "\n", + "We add the processed document chunks to the vector store. To handle a large number of documents efficiently, we add them in batches. The metadata is also filtered to ensure compatibility with the vector store." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "08f6ce04-9702-4372-be96-6fc34431fc21", + "metadata": {}, + "outputs": [], + "source": [ + "from langchain_community.vectorstores.utils import filter_complex_metadata\n", + "\n", + "# Function to insert embeddings in batches for a lengthy document set\n", + "def add_documents_in_batches(vectorstore, docs, batch_size=100):\n", + " \"\"\"Adds documents to the vectorstore in batches.\"\"\"\n", + " for i in range(0, len(docs), batch_size):\n", + " chunk = docs[i : i + batch_size]\n", + " vectorstore.add_documents(chunk)\n", + " print(f\"Added batch {i//batch_size + 1}/{(len(docs)-1)//batch_size + 1}\")\n", + " # Persist the database to disk if the method is available\n", + " if hasattr(vectorstore, \"persist\"):\n", + " vectorstore.persist()\n", + "\n", + "# Filter out complex metadata that might cause issues\n", + "filtered_docs = filter_complex_metadata(split_docs)\n", + "\n", + "# Add the documents to the vector store in batches\n", + "add_documents_in_batches(vectorstore, filtered_docs)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Set up a Reranking Retriever\n", + "\n", + "To improve the quality of retrieved documents, we use a reranker. The initial retriever fetches a set of documents (e.g., k=5), and the reranker (`FlashrankRerank`) re-orders them based on their relevance to the query. This ensures that the most relevant context is passed to the LLM.\n", + "\n", + "- **`ContextualCompressionRetriever`**: Wraps a base retriever and a document compressor (the reranker) to create this two-stage retrieval process." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "01bc1634-dcea-431c-b447-af5b7d38aaeb", + "metadata": {}, + "outputs": [], + "source": [ + "from langchain.retrievers import ContextualCompressionRetriever\n", + "from langchain.retrievers.document_compressors import FlashrankRerank\n", + "\n", + "# Set up the base retriever to fetch the top 5 documents\n", + "retriever = vectorstore.as_retriever(search_kwargs={\"k\": 5})\n", + "\n", + "# Initialize the reranker\n", + "compressor = FlashrankRerank()\n", + "\n", + "# Create the compression retriever, which combines retrieval and reranking\n", + "compression_retriever = ContextualCompressionRetriever(\n", + " base_compressor=compressor,\n", + " base_retriever=retriever\n", + ")" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 6. Building the RAG Chain\n", + "\n", + "With all the components ready, we now assemble the final RAG pipeline using LangChain's `RetrievalQA` chain. This chain connects the LLM with the retriever.\n", + "\n", + "- **`chain_type=\"stuff\"`**: This means all retrieved documents will be \"stuffed\" into the prompt sent to the LLM.\n", + "- **`return_source_documents=True`**: This is important for evaluation, as it allows us to see which documents were used to generate the answer." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "e18d5187-a6c6-406f-b5b2-f9982d97d3a2", + "metadata": {}, + "outputs": [], + "source": [ + "from langchain.chains import RetrievalQA\n", + "\n", + "qa_chain = RetrievalQA.from_chain_type(\n", + " llm=llm,\n", + " chain_type=\"stuff\",\n", + " retriever=compression_retriever,\n", + " return_source_documents=True\n", + ")" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 7. Running the RAG Pipeline\n", + "\n", + "It's time to ask a question! The `qa_chain.invoke` method will execute the full RAG process: retrieve relevant documents, pass them to the LLM along with the question, and return the final answer." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "fc8def11-554b-4bd1-ab37-9824f003966e", + "metadata": {}, + "outputs": [], + "source": [ + "question = \"What is a mutual fund?\"\n", + "result = qa_chain.invoke({\"query\": question})\n", + "print(\"--- Question ---\")\n", + "print(question)\n", + "print(\"\\n--- Answer ---\")\n", + "print(result[\"result\"])" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Extract Answer and Context for Evaluation\n", + "\n", + "For the evaluation step, we need to isolate the generated answer and the source documents (the context or \"reference\")." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "991a94fc-b7b3-4709-896b-c613e1b857b8", + "metadata": {}, + "outputs": [], + "source": [ + "answer = result['result']\n", + "context = \" \".join([d.page_content for d in result['source_documents']])" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 8. Evaluation\n", + "\n", + "To assess the quality of our RAG pipeline, we use a custom `OpenVINORAGEvaluator` class. This class uses OpenVINO-optimized models to calculate several key metrics:\n", + "\n", + "- **BLEU & ROUGE**: Measure the overlap between the generated answer and the reference context.\n", + "- **BERTScore**: Computes semantic similarity, which is more advanced than simple overlap.\n", + "- **Perplexity**: Measures how well a language model (here, Llama-2-7B) predicts the generated text. Lower is better.\n", + "- **Diversity**: Calculates the variety of tokens in the response.\n", + "- **Racial Bias**: Uses a hate speech detection model to check for biased content.\n", + "\n", + "**Note**: The first time you run this, it will download and convert the necessary evaluation models (Llama-2-7B and a hate speech model) to the OpenVINO format. This is a one-time setup." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "1a8122e4-6602-4750-ad6e-c5cc599e0b0a", + "metadata": {}, + "outputs": [], + "source": [ + "import openvino as ov\n", + "import numpy as np\n", + "import torch\n", + "from transformers import AutoTokenizer, AutoModelForCausalLM, AutoModelForSequenceClassification\n", + "from optimum.intel import OVModelForCausalLM, OVModelForSequenceClassification\n", + "from nltk.translate.bleu_score import corpus_bleu, SmoothingFunction\n", + "from rouge_score import rouge_scorer\n", + "from bert_score import score\n", + "from nltk.util import ngrams\n", + "from typing import List\n", + "import os\n", + "\n", + "class OpenVINORAGEvaluator:\n", + " \"\"\"An evaluator for RAG pipelines using OpenVINO-optimized models.\"\"\"\n", + " \n", + " def __init__(self, device=\"GPU\", models_dir=\"./openvino_models\"):\n", + " self.device = device\n", + " self.models_dir = models_dir\n", + " os.makedirs(self.models_dir, exist_ok=True)\n", + " \n", + " # Initialize models and tokenizers for evaluation\n", + " self.llama2_model, self.llama2_tokenizer = self._load_model(\n", + " model_id=\"meta-llama/Llama-2-7b-hf\",\n", + " ov_model_class=OVModelForCausalLM,\n", + " subfolder=\"llama2-7b-openvino\"\n", + " )\n", + " self.bias_model, self.bias_tokenizer = self._load_model(\n", + " model_id=\"Hate-speech-CNERG/dehatebert-mono-english\",\n", + " ov_model_class=OVModelForSequenceClassification,\n", + " subfolder=\"hate-speech-openvino\"\n", + " )\n", + " print(f\"OpenVINO RAG Evaluator initialized on {device}\")\n", + "\n", + " def _load_model(self, model_id, ov_model_class, subfolder):\n", + " \"\"\"Generic function to load or convert a model to OpenVINO format.\"\"\"\n", + " model_path = os.path.join(self.models_dir, subfolder)\n", + " \n", + " if not os.path.exists(os.path.join(model_path, \"openvino_model.xml\")):\n", + " print(f\"Converting {model_id} to OpenVINO format...\")\n", + " ov_model = ov_model_class.from_pretrained(model_id, export=True, compile=False)\n", + " ov_model.save_pretrained(model_path)\n", + " print(f\"Model saved to {model_path}\")\n", + " \n", + " try:\n", + " print(f\"Loading {model_id} from {model_path}...\")\n", + " model = ov_model_class.from_pretrained(model_path, device=self.device)\n", + " tokenizer = AutoTokenizer.from_pretrained(model_id)\n", + " print(f\"{model_id} loaded successfully.\")\n", + " return model, tokenizer\n", + " except Exception as e:\n", + " print(f\"Error loading {model_id}: {e}\")\n", + " return None, None\n", + "\n", + " def evaluate_bleu_rouge(self, candidates: List[str], references: List[str]):\n", + " \"\"\"Calculates BLEU and ROUGE scores.\"\"\"\n", + " candidate_tokens = [c.split() for c in candidates]\n", + " reference_tokens = [[r.split()] for r in references]\n", + " \n", + " # BLEU with smoothing\n", + " smoothing = SmoothingFunction().method1\n", + " bleu_score = corpus_bleu(reference_tokens, candidate_tokens, smoothing_function=smoothing)\n", + " \n", + " # ROUGE\n", + " scorer = rouge_scorer.RougeScorer(['rouge1', 'rougeL'], use_stemmer=True)\n", + " rouge1_f1 = sum(scorer.score(ref, cand)['rouge1'].fmeasure for ref, cand in zip(references, candidates)) / len(candidates)\n", + " return bleu_score, rouge1_f1\n", + "\n", + " def evaluate_bert_score(self, candidates: List[str], references: List[str]):\n", + " \"\"\"Calculates BERTScore.\"\"\"\n", + " _, _, f1 = score(candidates, references, lang=\"en\", model_type='bert-base-multilingual-cased')\n", + " return f1.mean().item()\n", + "\n", + " def evaluate_perplexity(self, text: str):\n", + " \"\"\"Calculates perplexity using the loaded Llama-2 model.\"\"\"\n", + " if not self.llama2_model:\n", + " return float('inf')\n", + " \n", + " encodings = self.llama2_tokenizer(text, return_tensors='pt', max_length=1024, truncation=True)\n", + " input_ids = encodings.input_ids.to(self.llama2_model.device)\n", + " \n", + " with torch.no_grad():\n", + " outputs = self.llama2_model(input_ids, labels=input_ids)\n", + " loss = outputs.loss\n", + " perplexity = torch.exp(loss)\n", + " \n", + " return perplexity.item()\n", + "\n", + " def evaluate_racial_bias(self, text: str):\n", + " \"\"\"Evaluates racial bias using a hate speech detection model.\"\"\"\n", + " if not self.bias_model:\n", + " return 0.0\n", + "\n", + " inputs = self.bias_tokenizer(text, return_tensors=\"pt\", truncation=True, max_length=512)\n", + " with torch.no_grad():\n", + " logits = self.bias_model(**inputs).logits\n", + " probabilities = torch.nn.functional.softmax(logits, dim=-1)\n", + " # Return the probability of the 'hate speech' class (index 1)\n", + " bias_score = probabilities[0][1].item()\n", + " return bias_score\n", + " \n", + " def evaluate_all(self, response: str, reference: str):\n", + " \"\"\"Runs a comprehensive evaluation and returns all metrics.\"\"\"\n", + " candidates = [response]\n", + " references = [reference]\n", + " \n", + " try:\n", + " bleu, rouge1 = self.evaluate_bleu_rouge(candidates, references)\n", + " bert_f1 = self.evaluate_bert_score(candidates, references)\n", + " perplexity = self.evaluate_perplexity(response)\n", + " racial_bias = self.evaluate_racial_bias(response)\n", + " \n", + " return {\n", + " \"BLEU\": bleu,\n", + " \"ROUGE-1\": rouge1,\n", + " \"BERT F1\": bert_f1,\n", + " \"Perplexity\": perplexity,\n", + " \"Racial Bias\": racial_bias\n", + " }\n", + " except Exception as e:\n", + " print(f\"An error occurred during evaluation: {e}\")\n", + " return {k: 0.0 for k in [\"BLEU\", \"ROUGE-1\", \"BERT F1\", \"Perplexity\", \"Racial Bias\"]}" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Run the Evaluation\n", + "\n", + "Finally, we initialize the `OpenVINORAGEvaluator` and call `evaluate_all` to get a dictionary of scores. This provides a quantitative look at the performance of our RAG pipeline for the given query." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "3dedb93b-20e3-43c3-93a1-38cb3e114019", + "metadata": {}, + "outputs": [], + "source": [ + "# Initialize the evaluator (this might take a moment on the first run)\n", + "evaluator = OpenVINORAGEvaluator(device=\"GPU\")\n", + "\n", + "# Prepare the data for evaluation\n", + "response_text = answer\n", + "reference_text = context.strip()\n", + "\n", + "# Get all evaluation metrics\n", + "metrics = evaluator.evaluate_all(response_text, reference_text)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "8006c91c-1180-41c8-b04e-448e4131391f", + "metadata": {}, + "outputs": [], + "source": [ + "print(\"--- Evaluation Metrics ---\")\n", + "for metric, value in metrics.items():\n", + " print(f\"{metric}: {value:.4f}\")" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3 (ipykernel)", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.10.11" + } + }, + "nbformat": 4, + "nbformat_minor": 5 +} diff --git a/notebooks/llm-rag-ov-langchain/requirements.txt b/notebooks/llm-rag-ov-langchain/requirements.txt new file mode 100644 index 00000000000..ae31a56b1aa --- /dev/null +++ b/notebooks/llm-rag-ov-langchain/requirements.txt @@ -0,0 +1,24 @@ +sacrebleu +rouge-score +bert-score +transformers +typing +nltk +numpy +textblob +dataset +langchain +langchain_community +chromadb +langchain-chroma +langchain-huggingface +sentence-transformers +Flashrank +langchain_community +msoffcrypto-tool +docx2txt +urllib +bs4 +os +optimum[intel] +python-docx From 20a540c9cc9c701fa2ae0a9bce538ecbda78b21f Mon Sep 17 00:00:00 2001 From: pkhara31 <112378664+pkhara31@users.noreply.github.com> Date: Mon, 8 Dec 2025 14:31:37 +0530 Subject: [PATCH 2/5] Delete notebooks/llm-rag-ov-langchain/ov_rag_evaluator.ipynb --- .../ov_rag_evaluator.ipynb | 662 ------------------ 1 file changed, 662 deletions(-) delete mode 100644 notebooks/llm-rag-ov-langchain/ov_rag_evaluator.ipynb diff --git a/notebooks/llm-rag-ov-langchain/ov_rag_evaluator.ipynb b/notebooks/llm-rag-ov-langchain/ov_rag_evaluator.ipynb deleted file mode 100644 index 12299f9a75f..00000000000 --- a/notebooks/llm-rag-ov-langchain/ov_rag_evaluator.ipynb +++ /dev/null @@ -1,662 +0,0 @@ -{ - "cells": [ - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "# RAG Performance & Fairness Evaluation Toolkit (OpenVINO + LangChain)\n", - "\n", - "This notebook demonstrates how to build and evaluate a Retrieval-Augmented Generation (RAG) pipeline using OpenVINO™ for accelerated performance on Intel hardware. We will use Hugging Face and LangChain libraries to construct the pipeline.\n", - "\n", - "The process involves:\n", - "1. **Environment Setup**: Installing necessary libraries.\n", - "2. **LLM and Tokenizer Setup**: Loading a language model (Microsoft's Phi-3-mini) and its tokenizer, optimized with OpenVINO.\n", - "3. **Embedding Model Setup**: Preparing an embedding model to convert text into vector representations.\n", - "4. **Data Loading and Processing**: Fetching documents from a web source, splitting them into manageable chunks, and creating vector embeddings.\n", - "5. **Vector Store and Retriever Setup**: Storing the embeddings in a ChromaDB vector store and setting up a retriever with reranking for improved accuracy.\n", - "6. **Building the RAG Chain**: Creating a `RetrievalQA` chain that combines the retriever and the LLM.\n", - "7. **Running the RAG Pipeline**: Asking a question to get a response from the RAG system.\n", - "8. **Evaluation**: Using a comprehensive `OpenVINORAGEvaluator` to assess the quality of the generated response based on various metrics like BLEU, ROUGE, BERTScore, perplexity, and bias." - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## 1. Environment Setup\n", - "\n", - "First, let's ensure all the required Python packages are installed. The following commands handle the installation of essential libraries. These are typically only needed if you encounter version conflicts or issues with existing installations." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "id": "c4a2dc6a-3d3e-4da2-902f-30f3cbd24b39", - "metadata": {}, - "outputs": [], - "source": [ - "# Execute only if needed (in case of any errors due to setuptools or whl installation)\n", - "!pip install setuptools==57.0.0 --force-reinstall\n", - "!pip install wheel==0.36.2 --force-reinstall\n", - "!pip uninstall comtypes -y\n", - "!pip install --no-cache-dir comtypes" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## 2. LLM and Tokenizer Setup\n", - "\n", - "Next, we load the Large Language Model (LLM) and its corresponding tokenizer. We use `optimum-intel` to convert and accelerate the model with OpenVINO. In this example, we use `microsoft/Phi-3-mini-4k-instruct`, but you can replace it with another compatible model.\n", - "\n", - "- **`OVModelForCausalLM`**: Loads a causal language model and automatically converts it to the OpenVINO format (`export=True`).\n", - "- **`device=\"GPU\"`**: Specifies that the model should run on the integrated GPU for acceleration. You can change this to `\"CPU\"`." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "id": "90e68d95-9a4e-4ba5-9040-4422c1333444", - "metadata": {}, - "outputs": [], - "source": [ - "from optimum.intel import OVModelForCausalLM\n", - "from transformers import AutoTokenizer, pipeline\n", - "from langchain_huggingface import HuggingFacePipeline\n", - "\n", - "# Load model with OpenVINO backend\n", - "model = OVModelForCausalLM.from_pretrained(\n", - " \"microsoft/Phi-3-mini-4k-instruct\", # You can plug in any other supported model\n", - " export=True, # Convert to OpenVINO format on the fly\n", - " device=\"GPU\" # Specify GPU for inference, can also be \"CPU\"\n", - ")\n", - "\n", - "tokenizer = AutoTokenizer.from_pretrained(\"microsoft/Phi-3-mini-4k-instruct\")" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Create a LangChain-compatible LLM Pipeline\n", - "\n", - "We now create a `text-generation` pipeline using the OpenVINO-optimized model and tokenizer. This pipeline is then wrapped in `HuggingFacePipeline` to make it compatible with the LangChain ecosystem. A quick test is run to confirm the pipeline is working correctly." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "id": "f985ca28-9e3d-490d-954c-71b24fc47eda", - "metadata": {}, - "outputs": [], - "source": [ - "# Create a text-generation pipeline with the OpenVINO model\n", - "llm_pipeline = pipeline(\n", - " \"text-generation\",\n", - " model=model,\n", - " tokenizer=tokenizer,\n", - " device=model.device,\n", - " max_new_tokens=100,\n", - " top_k=50,\n", - " temperature=0.1,\n", - " do_sample=True\n", - ")\n", - "\n", - "# Create a LangChain instance from the Hugging Face pipeline\n", - "llm = HuggingFacePipeline(pipeline=llm_pipeline)\n", - "\n", - "# Test the pipeline with a sample query\n", - "response = llm.invoke(\"What is an ocean?\")\n", - "print(response)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## 3. Embedding Model Setup\n", - "\n", - "For the retrieval part of our RAG pipeline, we need an embedding model to convert text documents into numerical vectors. We use `OpenVINOBgeEmbeddings` from `langchain_community`, which provides OpenVINO-optimized embeddings for efficient performance. Here, we use the `bge-small-en-v1.5` model." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "id": "606ff70a-f797-42a5-a697-8cb5c13c0dae", - "metadata": {}, - "outputs": [], - "source": [ - "from langchain_community.embeddings import OpenVINOBgeEmbeddings\n", - "\n", - "embedding_model_name = \"bge-small-en-v1.5\" # You can plug in any other embedding model\n", - "embedding_model_kwargs = {\"device\": \"CPU\"} # Embeddings often run well on CPU\n", - "encode_kwargs = {\n", - " \"normalize_embeddings\": True,\n", - "}\n", - "\n", - "# Initialize OpenVINO-optimized embeddings\n", - "embedding = OpenVINOBgeEmbeddings(\n", - " model_name=embedding_model_name,\n", - " model_kwargs=embedding_model_kwargs,\n", - " encode_kwargs=encode_kwargs,\n", - ")\n", - "\n", - "# Sample text to verify embedding functionality\n", - "text = \"This is a test document.\"\n", - "embedding_result = embedding.embed_query(text)\n", - "print(\"Sample embedding (first 3 dimensions):\", embedding_result[:3])" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## 4. Data Loading and Processing\n", - "\n", - "Now we'll load the documents that will form the knowledge base for our RAG pipeline. This notebook includes two methods for loading documents:\n", - "\n", - "1. **Web Crawling (Enabled by default)**: Fetches content from a website's sitemap. We use `WebBaseLoader` to load content from URLs found in the sitemap of Zerodha Varsity.\n", - "2. **Local File Loading (Commented out)**: A robust `LangChainDocumentLoader` class is provided to load various file types (`.txt`, `.pdf`, `.docx`, etc.) from a local directory. You can uncomment and adapt this section if you want to use your own local files." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "id": "808e9c2d-ab4a-4bc6-bb45-f3b2d4be3156", - "metadata": {}, - "outputs": [], - "source": [ - "import bs4\n", - "from urllib.request import Request, urlopen\n", - "from bs4 import BeautifulSoup\n", - "import ssl\n", - "from langchain_community.document_loaders import WebBaseLoader\n", - "\n", - "# --- Method 1: Load documents by crawling a web page (default) ---\n", - "def get_sitemap(url):\n", - " \"\"\"Fetches and parses an XML sitemap from a URL.\"\"\"\n", - " req = Request(url, headers={\"User-Agent\": \"Mozilla/5.0\"})\n", - " response = urlopen(req)\n", - " xml = BeautifulSoup(response, \"lxml-xml\", from_encoding=response.info().get_param(\"charset\"))\n", - " return xml\n", - "\n", - "def get_urls_from_sitemap(xml):\n", - " \"\"\"Extracts all URLs from a parsed sitemap XML.\"\"\"\n", - " urls = [loc.text for loc in xml.find_all(\"loc\")]\n", - " return urls\n", - "\n", - "# Bypass SSL verification issues if they arise\n", - "ssl._create_default_https_context = ssl._create_stdlib_context\n", - "\n", - "sitemap_url = \"https://zerodha.com/varsity/chapter-sitemap2.xml\"\n", - "sitemap_xml = get_sitemap(sitemap_url)\n", - "urls = get_urls_from_sitemap(sitemap_xml)\n", - "\n", - "# Load documents from the collected URLs\n", - "docs = []\n", - "for i, url in enumerate(urls):\n", - " try:\n", - " loader = WebBaseLoader(url)\n", - " docs.extend(loader.load())\n", - " if (i + 1) % 10 == 0:\n", - " print(f\"Loaded {i + 1}/{len(urls)} URLs\")\n", - " except Exception as e:\n", - " print(f\"Failed to load {url}: {e}\")\n", - "\n", - "print(f\"\\nTotal documents loaded: {len(docs)}\")\n", - "\n", - "# --- Method 2: Load documents locally from the system (commented out) ---\n", - "'''\n", - "import os\n", - "from langchain.document_loaders import (\n", - " TextLoader,\n", - " PyPDFLoader,\n", - " DirectoryLoader,\n", - ")\n", - "from langchain.schema import Document as LCDocument\n", - "from typing import List\n", - "\n", - "class LocalDocumentLoader:\n", - " \"\"\"Load documents from a local directory using LangChain loaders.\"\"\"\n", - " def __init__(self, directory_path: str):\n", - " self.directory_path = directory_path\n", - "\n", - " def load(self) -> List[LCDocument]:\n", - " \"\"\"Loads all supported documents from the directory.\"\"\"\n", - " if not self.directory_path:\n", - " raise ValueError(\"Directory path not set.\")\n", - "\n", - " # Define loaders for different file types\n", - " txt_loader = DirectoryLoader(\n", - " self.directory_path, glob=\"**/*.txt\", loader_cls=TextLoader,\n", - " loader_kwargs={\"encoding\": \"utf-8\"}, show_progress=True\n", - " )\n", - " pdf_loader = DirectoryLoader(\n", - " self.directory_path, glob=\"**/*.pdf\", loader_cls=PyPDFLoader, show_progress=True\n", - " )\n", - "\n", - " documents = []\n", - " documents.extend(txt_loader.load())\n", - " documents.extend(pdf_loader.load())\n", - " \n", - " return documents\n", - "\n", - "# Usage Example:\n", - "# loader = LocalDocumentLoader(directory_path=\"/path/to/your/documents\")\n", - "# docs = loader.load()\n", - "# print(f\"Loaded {len(docs)} local documents.\")\n", - "'''" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Split Documents into Chunks\n", - "\n", - "LLMs have a limited context window, so we need to split large documents into smaller chunks. This ensures that the model can process the retrieved information effectively. We use `RecursiveCharacterTextSplitter` which is a smart way to split text while trying to keep related content together." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "id": "51d07ec4-b929-4893-baff-af68a4fbf3aa", - "metadata": {}, - "outputs": [], - "source": [ - "from langchain.text_splitter import RecursiveCharacterTextSplitter\n", - "\n", - "# Split the documents into smaller chunks with a specified size and overlap\n", - "text_splitter = RecursiveCharacterTextSplitter(\n", - " chunk_size=1250,\n", - " chunk_overlap=100,\n", - " length_function=len,\n", - " is_separator_regex=False\n", - ")\n", - "\n", - "split_docs = text_splitter.split_documents(docs)\n", - "print(f\"Documents split into {len(split_docs)} chunks.\")" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## 5. Vector Store and Retriever Setup\n", - "\n", - "Now we'll create a vector store to house the document embeddings and enable efficient similarity searches.\n", - "\n", - "- **`Chroma`**: We use ChromaDB as our vector store. It's a lightweight and easy-to-use vector database.\n", - "- **`persist_directory`**: This saves the created database to disk, allowing us to reuse it later without re-processing the documents." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "id": "194acf26-f710-483d-97d6-57bfff7cfa65", - "metadata": {}, - "outputs": [], - "source": [ - "# Create a ChromaDB instance to store the document embeddings\n", - "from langchain_community.vectorstores import Chroma\n", - "\n", - "vectorstore = Chroma(\n", - " embedding_function=embedding,\n", - " persist_directory=\"./chromadb_varsity\",\n", - " collection_name=\"zerodha_varsity_docs\"\n", - ")" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Add Documents to the Vector Store\n", - "\n", - "We add the processed document chunks to the vector store. To handle a large number of documents efficiently, we add them in batches. The metadata is also filtered to ensure compatibility with the vector store." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "id": "08f6ce04-9702-4372-be96-6fc34431fc21", - "metadata": {}, - "outputs": [], - "source": [ - "from langchain_community.vectorstores.utils import filter_complex_metadata\n", - "\n", - "# Function to insert embeddings in batches for a lengthy document set\n", - "def add_documents_in_batches(vectorstore, docs, batch_size=100):\n", - " \"\"\"Adds documents to the vectorstore in batches.\"\"\"\n", - " for i in range(0, len(docs), batch_size):\n", - " chunk = docs[i : i + batch_size]\n", - " vectorstore.add_documents(chunk)\n", - " print(f\"Added batch {i//batch_size + 1}/{(len(docs)-1)//batch_size + 1}\")\n", - " # Persist the database to disk if the method is available\n", - " if hasattr(vectorstore, \"persist\"):\n", - " vectorstore.persist()\n", - "\n", - "# Filter out complex metadata that might cause issues\n", - "filtered_docs = filter_complex_metadata(split_docs)\n", - "\n", - "# Add the documents to the vector store in batches\n", - "add_documents_in_batches(vectorstore, filtered_docs)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Set up a Reranking Retriever\n", - "\n", - "To improve the quality of retrieved documents, we use a reranker. The initial retriever fetches a set of documents (e.g., k=5), and the reranker (`FlashrankRerank`) re-orders them based on their relevance to the query. This ensures that the most relevant context is passed to the LLM.\n", - "\n", - "- **`ContextualCompressionRetriever`**: Wraps a base retriever and a document compressor (the reranker) to create this two-stage retrieval process." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "id": "01bc1634-dcea-431c-b447-af5b7d38aaeb", - "metadata": {}, - "outputs": [], - "source": [ - "from langchain.retrievers import ContextualCompressionRetriever\n", - "from langchain.retrievers.document_compressors import FlashrankRerank\n", - "\n", - "# Set up the base retriever to fetch the top 5 documents\n", - "retriever = vectorstore.as_retriever(search_kwargs={\"k\": 5})\n", - "\n", - "# Initialize the reranker\n", - "compressor = FlashrankRerank()\n", - "\n", - "# Create the compression retriever, which combines retrieval and reranking\n", - "compression_retriever = ContextualCompressionRetriever(\n", - " base_compressor=compressor,\n", - " base_retriever=retriever\n", - ")" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## 6. Building the RAG Chain\n", - "\n", - "With all the components ready, we now assemble the final RAG pipeline using LangChain's `RetrievalQA` chain. This chain connects the LLM with the retriever.\n", - "\n", - "- **`chain_type=\"stuff\"`**: This means all retrieved documents will be \"stuffed\" into the prompt sent to the LLM.\n", - "- **`return_source_documents=True`**: This is important for evaluation, as it allows us to see which documents were used to generate the answer." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "id": "e18d5187-a6c6-406f-b5b2-f9982d97d3a2", - "metadata": {}, - "outputs": [], - "source": [ - "from langchain.chains import RetrievalQA\n", - "\n", - "qa_chain = RetrievalQA.from_chain_type(\n", - " llm=llm,\n", - " chain_type=\"stuff\",\n", - " retriever=compression_retriever,\n", - " return_source_documents=True\n", - ")" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## 7. Running the RAG Pipeline\n", - "\n", - "It's time to ask a question! The `qa_chain.invoke` method will execute the full RAG process: retrieve relevant documents, pass them to the LLM along with the question, and return the final answer." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "id": "fc8def11-554b-4bd1-ab37-9824f003966e", - "metadata": {}, - "outputs": [], - "source": [ - "question = \"What is a mutual fund?\"\n", - "result = qa_chain.invoke({\"query\": question})\n", - "print(\"--- Question ---\")\n", - "print(question)\n", - "print(\"\\n--- Answer ---\")\n", - "print(result[\"result\"])" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Extract Answer and Context for Evaluation\n", - "\n", - "For the evaluation step, we need to isolate the generated answer and the source documents (the context or \"reference\")." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "id": "991a94fc-b7b3-4709-896b-c613e1b857b8", - "metadata": {}, - "outputs": [], - "source": [ - "answer = result['result']\n", - "context = \" \".join([d.page_content for d in result['source_documents']])" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## 8. Evaluation\n", - "\n", - "To assess the quality of our RAG pipeline, we use a custom `OpenVINORAGEvaluator` class. This class uses OpenVINO-optimized models to calculate several key metrics:\n", - "\n", - "- **BLEU & ROUGE**: Measure the overlap between the generated answer and the reference context.\n", - "- **BERTScore**: Computes semantic similarity, which is more advanced than simple overlap.\n", - "- **Perplexity**: Measures how well a language model (here, Llama-2-7B) predicts the generated text. Lower is better.\n", - "- **Diversity**: Calculates the variety of tokens in the response.\n", - "- **Racial Bias**: Uses a hate speech detection model to check for biased content.\n", - "\n", - "**Note**: The first time you run this, it will download and convert the necessary evaluation models (Llama-2-7B and a hate speech model) to the OpenVINO format. This is a one-time setup." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "id": "1a8122e4-6602-4750-ad6e-c5cc599e0b0a", - "metadata": {}, - "outputs": [], - "source": [ - "import openvino as ov\n", - "import numpy as np\n", - "import torch\n", - "from transformers import AutoTokenizer, AutoModelForCausalLM, AutoModelForSequenceClassification\n", - "from optimum.intel import OVModelForCausalLM, OVModelForSequenceClassification\n", - "from nltk.translate.bleu_score import corpus_bleu, SmoothingFunction\n", - "from rouge_score import rouge_scorer\n", - "from bert_score import score\n", - "from nltk.util import ngrams\n", - "from typing import List\n", - "import os\n", - "\n", - "class OpenVINORAGEvaluator:\n", - " \"\"\"An evaluator for RAG pipelines using OpenVINO-optimized models.\"\"\"\n", - " \n", - " def __init__(self, device=\"GPU\", models_dir=\"./openvino_models\"):\n", - " self.device = device\n", - " self.models_dir = models_dir\n", - " os.makedirs(self.models_dir, exist_ok=True)\n", - " \n", - " # Initialize models and tokenizers for evaluation\n", - " self.llama2_model, self.llama2_tokenizer = self._load_model(\n", - " model_id=\"meta-llama/Llama-2-7b-hf\",\n", - " ov_model_class=OVModelForCausalLM,\n", - " subfolder=\"llama2-7b-openvino\"\n", - " )\n", - " self.bias_model, self.bias_tokenizer = self._load_model(\n", - " model_id=\"Hate-speech-CNERG/dehatebert-mono-english\",\n", - " ov_model_class=OVModelForSequenceClassification,\n", - " subfolder=\"hate-speech-openvino\"\n", - " )\n", - " print(f\"OpenVINO RAG Evaluator initialized on {device}\")\n", - "\n", - " def _load_model(self, model_id, ov_model_class, subfolder):\n", - " \"\"\"Generic function to load or convert a model to OpenVINO format.\"\"\"\n", - " model_path = os.path.join(self.models_dir, subfolder)\n", - " \n", - " if not os.path.exists(os.path.join(model_path, \"openvino_model.xml\")):\n", - " print(f\"Converting {model_id} to OpenVINO format...\")\n", - " ov_model = ov_model_class.from_pretrained(model_id, export=True, compile=False)\n", - " ov_model.save_pretrained(model_path)\n", - " print(f\"Model saved to {model_path}\")\n", - " \n", - " try:\n", - " print(f\"Loading {model_id} from {model_path}...\")\n", - " model = ov_model_class.from_pretrained(model_path, device=self.device)\n", - " tokenizer = AutoTokenizer.from_pretrained(model_id)\n", - " print(f\"{model_id} loaded successfully.\")\n", - " return model, tokenizer\n", - " except Exception as e:\n", - " print(f\"Error loading {model_id}: {e}\")\n", - " return None, None\n", - "\n", - " def evaluate_bleu_rouge(self, candidates: List[str], references: List[str]):\n", - " \"\"\"Calculates BLEU and ROUGE scores.\"\"\"\n", - " candidate_tokens = [c.split() for c in candidates]\n", - " reference_tokens = [[r.split()] for r in references]\n", - " \n", - " # BLEU with smoothing\n", - " smoothing = SmoothingFunction().method1\n", - " bleu_score = corpus_bleu(reference_tokens, candidate_tokens, smoothing_function=smoothing)\n", - " \n", - " # ROUGE\n", - " scorer = rouge_scorer.RougeScorer(['rouge1', 'rougeL'], use_stemmer=True)\n", - " rouge1_f1 = sum(scorer.score(ref, cand)['rouge1'].fmeasure for ref, cand in zip(references, candidates)) / len(candidates)\n", - " return bleu_score, rouge1_f1\n", - "\n", - " def evaluate_bert_score(self, candidates: List[str], references: List[str]):\n", - " \"\"\"Calculates BERTScore.\"\"\"\n", - " _, _, f1 = score(candidates, references, lang=\"en\", model_type='bert-base-multilingual-cased')\n", - " return f1.mean().item()\n", - "\n", - " def evaluate_perplexity(self, text: str):\n", - " \"\"\"Calculates perplexity using the loaded Llama-2 model.\"\"\"\n", - " if not self.llama2_model:\n", - " return float('inf')\n", - " \n", - " encodings = self.llama2_tokenizer(text, return_tensors='pt', max_length=1024, truncation=True)\n", - " input_ids = encodings.input_ids.to(self.llama2_model.device)\n", - " \n", - " with torch.no_grad():\n", - " outputs = self.llama2_model(input_ids, labels=input_ids)\n", - " loss = outputs.loss\n", - " perplexity = torch.exp(loss)\n", - " \n", - " return perplexity.item()\n", - "\n", - " def evaluate_racial_bias(self, text: str):\n", - " \"\"\"Evaluates racial bias using a hate speech detection model.\"\"\"\n", - " if not self.bias_model:\n", - " return 0.0\n", - "\n", - " inputs = self.bias_tokenizer(text, return_tensors=\"pt\", truncation=True, max_length=512)\n", - " with torch.no_grad():\n", - " logits = self.bias_model(**inputs).logits\n", - " probabilities = torch.nn.functional.softmax(logits, dim=-1)\n", - " # Return the probability of the 'hate speech' class (index 1)\n", - " bias_score = probabilities[0][1].item()\n", - " return bias_score\n", - " \n", - " def evaluate_all(self, response: str, reference: str):\n", - " \"\"\"Runs a comprehensive evaluation and returns all metrics.\"\"\"\n", - " candidates = [response]\n", - " references = [reference]\n", - " \n", - " try:\n", - " bleu, rouge1 = self.evaluate_bleu_rouge(candidates, references)\n", - " bert_f1 = self.evaluate_bert_score(candidates, references)\n", - " perplexity = self.evaluate_perplexity(response)\n", - " racial_bias = self.evaluate_racial_bias(response)\n", - " \n", - " return {\n", - " \"BLEU\": bleu,\n", - " \"ROUGE-1\": rouge1,\n", - " \"BERT F1\": bert_f1,\n", - " \"Perplexity\": perplexity,\n", - " \"Racial Bias\": racial_bias\n", - " }\n", - " except Exception as e:\n", - " print(f\"An error occurred during evaluation: {e}\")\n", - " return {k: 0.0 for k in [\"BLEU\", \"ROUGE-1\", \"BERT F1\", \"Perplexity\", \"Racial Bias\"]}" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Run the Evaluation\n", - "\n", - "Finally, we initialize the `OpenVINORAGEvaluator` and call `evaluate_all` to get a dictionary of scores. This provides a quantitative look at the performance of our RAG pipeline for the given query." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "id": "3dedb93b-20e3-43c3-93a1-38cb3e114019", - "metadata": {}, - "outputs": [], - "source": [ - "# Initialize the evaluator (this might take a moment on the first run)\n", - "evaluator = OpenVINORAGEvaluator(device=\"GPU\")\n", - "\n", - "# Prepare the data for evaluation\n", - "response_text = answer\n", - "reference_text = context.strip()\n", - "\n", - "# Get all evaluation metrics\n", - "metrics = evaluator.evaluate_all(response_text, reference_text)" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "id": "8006c91c-1180-41c8-b04e-448e4131391f", - "metadata": {}, - "outputs": [], - "source": [ - "print(\"--- Evaluation Metrics ---\")\n", - "for metric, value in metrics.items():\n", - " print(f\"{metric}: {value:.4f}\")" - ] - } - ], - "metadata": { - "kernelspec": { - "display_name": "Python 3 (ipykernel)", - "language": "python", - "name": "python3" - }, - "language_info": { - "codemirror_mode": { - "name": "ipython", - "version": 3 - }, - "file_extension": ".py", - "mimetype": "text/x-python", - "name": "python", - "nbconvert_exporter": "python", - "pygments_lexer": "ipython3", - "version": "3.10.11" - } - }, - "nbformat": 4, - "nbformat_minor": 5 -} From 8a1454680714bb3162b2c037a8b37fbb85ff6c88 Mon Sep 17 00:00:00 2001 From: pkhara31 <112378664+pkhara31@users.noreply.github.com> Date: Mon, 8 Dec 2025 14:33:17 +0530 Subject: [PATCH 3/5] Add files via upload --- .../ov_rag_evaluator.ipynb | 1095 +++++++++++++++++ 1 file changed, 1095 insertions(+) create mode 100644 notebooks/llm-rag-ov-langchain/ov_rag_evaluator.ipynb diff --git a/notebooks/llm-rag-ov-langchain/ov_rag_evaluator.ipynb b/notebooks/llm-rag-ov-langchain/ov_rag_evaluator.ipynb new file mode 100644 index 00000000000..0be39a5013f --- /dev/null +++ b/notebooks/llm-rag-ov-langchain/ov_rag_evaluator.ipynb @@ -0,0 +1,1095 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "id": "7722a495", + "metadata": {}, + "source": [ + "# RAG Performance & Fairness Evaluation Toolkit (OpenVINO + LangChain)\n", + "\n", + "This notebook demonstrates how to build and evaluate a Retrieval-Augmented Generation (RAG) pipeline using OpenVINO™ for accelerated performance on Intel hardware. We will use Hugging Face and LangChain libraries to construct the pipeline.\n", + "\n", + "The process involves:\n", + "1. **Environment Setup**: Installing necessary libraries.\n", + "2. **LLM and Tokenizer Setup**: Loading a language model (Microsoft's Phi-3-mini) and its tokenizer, optimized with OpenVINO.\n", + "3. **Embedding Model Setup**: Preparing an embedding model to convert text into vector representations.\n", + "4. **Data Loading and Processing**: Fetching documents from a web source, splitting them into manageable chunks, and creating vector embeddings.\n", + "5. **Vector Store and Retriever Setup**: Storing the embeddings in a ChromaDB vector store and setting up a retriever with reranking for improved accuracy.\n", + "6. **Building the RAG Chain**: Creating a `RetrievalQA` chain that combines the retriever and the LLM.\n", + "7. **Running the RAG Pipeline**: Asking a question to get a response from the RAG system.\n", + "8. **Evaluation**: Using a comprehensive `OpenVINORAGEvaluator` to assess the quality of the generated response based on various metrics like BLEU, ROUGE, BERTScore, perplexity, and bias." + ] + }, + { + "cell_type": "markdown", + "id": "81a21a14", + "metadata": {}, + "source": [ + "## 1. Environment Setup\n", + "\n", + "First, let's ensure all the required Python packages are installed. The following commands handle the installation of essential libraries. These are typically only needed if you encounter version conflicts or issues with existing installations." + ] + }, + { + "cell_type": "code", + "execution_count": 1, + "id": "c4a2dc6a-3d3e-4da2-902f-30f3cbd24b39", + "metadata": {}, + "outputs": [], + "source": [ + "import os\n", + "import requests\n", + "from pathlib import Path\n", + "\n", + "if not Path(\"notebook_utils.py\").exists():\n", + " r = requests.get(\n", + " url=\"https://raw.githubusercontent.com/openvinotoolkit/openvino_notebooks/latest/utils/notebook_utils.py\",\n", + " )\n", + " with open(\"notebook_utils.py\", \"w\") as f:\n", + " f.write(r.text)\n", + "\n", + "if not Path(\"pip_helper.py\").exists():\n", + " r = requests.get(\n", + " url=\"https://raw.githubusercontent.com/openvinotoolkit/openvino_notebooks/latest/utils/pip_helper.py\",\n", + " )\n", + " open(\"pip_helper.py\", \"w\").write(r.text)\n", + "\n", + "from pip_helper import pip_install\n", + "\n", + "os.environ[\"GIT_CLONE_PROTECTION_ACTIVE\"] = \"false\"\n", + "\n", + "pip_install(\"--pre\", \"-U\", \"openvino>=2025.3.0\", \"--extra-index-url\", \"https://storage.openvinotoolkit.org/simple/wheels/nightly\")\n", + "pip_install(\"--pre\", \"-U\", \"openvino-tokenizers\", \"--extra-index-url\", \"https://storage.openvinotoolkit.org/simple/wheels/nightly\")\n", + "pip_install(\n", + " \"--extra-index-url\",\n", + " \"https://download.pytorch.org/whl/cpu\",\n", + " \"--upgrade-strategy\",\n", + " \"eager\",\n", + " \"optimum[openvino,nncf,onnxruntime]\",\n", + " \"sacrebleu\",\n", + " \"rouge-score\",\n", + " \"nncf>=2.18.0\",\n", + " \"bert-score\",\n", + " \"transformers\",\n", + " \"onnx\",\n", + " \"nltk\",\n", + " \"numpy\",\n", + " \"textblob\",\n", + " \"dataset\",\n", + " \"langchain\",\n", + " \"langchain_community\",\n", + " \"chromadb\",\n", + " \"langchain-chroma\",\n", + " \"langchain-huggingface\",\n", + " \"sentence-transformers\",\n", + " \"Flashrank\",\n", + " \"msoffcrypto-tool\",\n", + " \"docx2txt\",\n", + " \"bs4\",\n", + " \"python-docx\",\n", + " \"huggingface-hub>=0.26.5\",\n", + ")" + ] + }, + { + "cell_type": "markdown", + "id": "8a005fb2", + "metadata": {}, + "source": [ + "## 2. LLM and Tokenizer Setup\n", + "\n", + "Next, we load the Large Language Model (LLM) and its corresponding tokenizer. We use `optimum-intel` to convert and accelerate the model with OpenVINO. In this example, we use `microsoft/Phi-3-mini-4k-instruct`, but you can replace it with another compatible model.\n", + "\n", + "- **`OVModelForCausalLM`**: Loads a causal language model and automatically converts it to the OpenVINO format (`export=True`).\n", + "- **`device=\"GPU\"`**: Specifies that the model should run on the integrated GPU for acceleration. You can change this to `\"CPU\"`." + ] + }, + { + "cell_type": "code", + "execution_count": 1, + "id": "90e68d95-9a4e-4ba5-9040-4422c1333444", + "metadata": {}, + "outputs": [ + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "5c056e6b6ce94a01b542d9035c2d9523", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "Loading checkpoint shards: 0%| | 0/2 [00:00 0:\n", + "C:\\Users\\Local_Admin\\ovraglangchain\\lib\\site-packages\\optimum\\exporters\\openvino\\model_patcher.py:203: TracerWarning: torch.tensor results are registered as constants in the trace. You can safely ignore this warning if you use this function to create tensors out of constant variables that would be the same every time you call this function. In any other case, this might cause the trace to be incorrect.\n", + " torch.tensor(0.0, device=mask.device, dtype=dtype),\n", + "C:\\Users\\Local_Admin\\ovraglangchain\\lib\\site-packages\\optimum\\exporters\\openvino\\model_patcher.py:204: TracerWarning: torch.tensor results are registered as constants in the trace. You can safely ignore this warning if you use this function to create tensors out of constant variables that would be the same every time you call this function. In any other case, this might cause the trace to be incorrect.\n", + " torch.tensor(torch.finfo(torch.float16).min, device=mask.device, dtype=dtype),\n", + "C:\\Users\\Local_Admin\\ovraglangchain\\lib\\site-packages\\transformers\\cache_utils.py:551: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!\n", + " elif (\n", + "C:\\Users\\Local_Admin\\ovraglangchain\\lib\\site-packages\\transformers\\integrations\\sdpa_attention.py:59: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!\n", + " is_causal = query.shape[2] > 1 and attention_mask is None and getattr(module, \"is_causal\", True)\n" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "INFO:nncf:Statistics of the bitwidth distribution:\n", + "+---------------------------+-----------------------------+----------------------------------------+\n", + "| Weight compression mode | % all parameters (layers) | % ratio-defining parameters (layers) |\n", + "+===========================+=============================+========================================+\n", + "| int8_asym, per-channel | 100% (130 / 130) | 100% (130 / 130) |\n", + "+---------------------------+-----------------------------+----------------------------------------+\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "198db25aa4394b4f9bb74f3eacf2ca7d", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "Output()" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "text/html": [ + "
\n"
+      ],
+      "text/plain": []
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    }
+   ],
+   "source": [
+    "from optimum.intel import OVModelForCausalLM\n",
+    "from transformers import AutoTokenizer, pipeline\n",
+    "from langchain_huggingface import HuggingFacePipeline\n",
+    "\n",
+    "# Load model with OpenVINO backend\n",
+    "model = OVModelForCausalLM.from_pretrained(\n",
+    "    \"microsoft/Phi-3-mini-4k-instruct\", # You can plug in any other supported model\n",
+    "    export=True,  # Convert to OpenVINO format on the fly\n",
+    "    device=\"GPU\"  # Specify GPU for inference, can also be \"CPU\"\n",
+    ")\n",
+    "\n",
+    "tokenizer = AutoTokenizer.from_pretrained(\"microsoft/Phi-3-mini-4k-instruct\")\n",
+    "model.save_pretrained(\"ov_model\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "27419145",
+   "metadata": {},
+   "source": [
+    "### Create a LangChain-compatible LLM Pipeline\n",
+    "\n",
+    "We now create a `text-generation` pipeline using the OpenVINO-optimized model and tokenizer. This pipeline is then wrapped in `HuggingFacePipeline` to make it compatible with the LangChain ecosystem. A quick test is run to confirm the pipeline is working correctly."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 2,
+   "id": "f985ca28-9e3d-490d-954c-71b24fc47eda",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "Device set to use cpu\n"
+     ]
+    },
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "What is an ocean? An ocean is a vast body of saltwater that covers approximately 71% of the Earth's surface. It is the largest component of the hydrosphere and plays a crucial role in the global climate system. Oceans are divided into five major basins: the Pacific, Atlantic, Indian, Southern (Antarctic), and Arctic Oceans. These bodies of water are interconnected and contain a diverse range of marine life, ecosystems, and geological\n"
+     ]
+    }
+   ],
+   "source": [
+    "# Create a text-generation pipeline with the OpenVINO model\n",
+    "llm_pipeline = pipeline(\n",
+    "    \"text-generation\",\n",
+    "    model=model,\n",
+    "    tokenizer=tokenizer,\n",
+    "    device=model.device,\n",
+    "    max_new_tokens=100,\n",
+    "    top_k=50,\n",
+    "    temperature=0.1,\n",
+    "    do_sample=True\n",
+    ")\n",
+    "\n",
+    "# Create a LangChain instance from the Hugging Face pipeline\n",
+    "llm = HuggingFacePipeline(pipeline=llm_pipeline)\n",
+    "\n",
+    "# Test the pipeline with a sample query\n",
+    "response = llm.invoke(\"What is an ocean?\")\n",
+    "print(response)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "05fee9cc",
+   "metadata": {},
+   "source": [
+    "## 3. Embedding Model Setup\n",
+    "\n",
+    "For the retrieval part of our RAG pipeline, we need an embedding model to convert text documents into numerical vectors. We use `OpenVINOBgeEmbeddings` from `langchain_community`, which provides OpenVINO-optimized embeddings for efficient performance. Here, we use the `bge-small-en-v1.5` model."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 7,
+   "id": "606ff70a-f797-42a5-a697-8cb5c13c0dae",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Model saved to ./saved_bge_model\n"
+     ]
+    },
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "C:\\Users\\Local_Admin\\ovraglangchain\\lib\\site-packages\\transformers\\modeling_attn_mask_utils.py:196: TracerWarning: torch.tensor results are registered as constants in the trace. You can safely ignore this warning if you use this function to create tensors out of constant variables that would be the same every time you call this function. In any other case, this might cause the trace to be incorrect.\n",
+      "  inverted_mask = torch.tensor(1.0, dtype=dtype) - expanded_mask\n"
+     ]
+    },
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Sample embedding (first 3 dimensions): [-0.042086612433195114, 0.06681863963603973, 0.007916754111647606]\n"
+     ]
+    }
+   ],
+   "source": [
+    "from langchain_community.embeddings import OpenVINOBgeEmbeddings\n",
+    "from sentence_transformers import SentenceTransformer\n",
+    "import os\n",
+    "\n",
+    "# First time: Download and save the model\n",
+    "embedding_model_name = \"BAAI/bge-small-en-v1.5\"  # Full HF repo path\n",
+    "save_directory = \"./saved_bge_model\"\n",
+    "\n",
+    "# Download the model using SentenceTransformer directly\n",
+    "st_model = SentenceTransformer(embedding_model_name)\n",
+    "st_model.save(save_directory)\n",
+    "print(f\"Model saved to {save_directory}\")\n",
+    "\n",
+    "# Now create the OpenVINO embedding with the saved model\n",
+    "embedding = OpenVINOBgeEmbeddings(\n",
+    "    model_name_or_path=save_directory,  # Use saved path\n",
+    "    model_kwargs={\"device\": \"CPU\"},\n",
+    "    encode_kwargs={\"normalize_embeddings\": True},\n",
+    ")\n",
+    "\n",
+    "# Load the saved model from local directory\n",
+    "local_model_path = \"./saved_bge_model\"\n",
+    "\n",
+    "embedding = OpenVINOBgeEmbeddings(\n",
+    "    model_name_or_path=local_model_path,\n",
+    "    model_kwargs={\"device\": \"CPU\"},\n",
+    "    encode_kwargs={\"normalize_embeddings\": True},\n",
+    ")\n",
+    "\n",
+    "# Test the loaded model\n",
+    "text = \"This is a test document.\"\n",
+    "embedding_result = embedding.embed_query(text)\n",
+    "print(\"Sample embedding (first 3 dimensions):\", embedding_result[:3])"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "f8defdcb",
+   "metadata": {},
+   "source": [
+    "## 4. Data Loading and Processing\n",
+    "\n",
+    "Now we'll load the documents that will form the knowledge base for our RAG pipeline. This notebook includes two methods for loading documents:\n",
+    "\n",
+    "1.  **Web Crawling (Enabled by default)**: Fetches content from a website's sitemap. We use `WebBaseLoader` to load content from URLs found in the sitemap of Zerodha Varsity.\n",
+    "2.  **Local File Loading (Commented out)**: A robust `LangChainDocumentLoader` class is provided to load various file types (`.txt`, `.pdf`, `.docx`, etc.) from a local directory. You can uncomment and adapt this section if you want to use your own local files."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 4,
+   "id": "808e9c2d-ab4a-4bc6-bb45-f3b2d4be3156",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "0it [00:00, ?it/s]\n",
+      "100%|████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:01<00:00,  1.18s/it]"
+     ]
+    },
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Loaded 8 local documents.\n"
+     ]
+    },
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "\n"
+     ]
+    }
+   ],
+   "source": [
+    "import bs4\n",
+    "from urllib.request import Request, urlopen\n",
+    "from bs4 import BeautifulSoup\n",
+    "import ssl\n",
+    "from langchain_community.document_loaders import WebBaseLoader\n",
+    "'''\n",
+    "# --- Method 1: Load documents by crawling a web page (default) ---\n",
+    "def get_sitemap(url):\n",
+    "    \"\"\"Fetches and parses an XML sitemap from a URL.\"\"\"\n",
+    "    req = Request(url, headers={\"User-Agent\": \"Mozilla/5.0\"})\n",
+    "    response = urlopen(req)\n",
+    "    xml = BeautifulSoup(response, \"lxml-xml\", from_encoding=response.info().get_param(\"charset\"))\n",
+    "    return xml\n",
+    "\n",
+    "def get_urls_from_sitemap(xml):\n",
+    "    \"\"\"Extracts all URLs from a parsed sitemap XML.\"\"\"\n",
+    "    urls = [loc.text for loc in xml.find_all(\"loc\")]\n",
+    "    return urls\n",
+    "\n",
+    "# Bypass SSL verification issues if they arise\n",
+    "ssl._create_default_https_context = ssl._create_stdlib_context\n",
+    "\n",
+    "sitemap_url = \"https://zerodha.com/varsity/chapter-sitemap2.xml\"\n",
+    "sitemap_xml = get_sitemap(sitemap_url)\n",
+    "urls = get_urls_from_sitemap(sitemap_xml)\n",
+    "\n",
+    "# Load documents from the collected URLs\n",
+    "docs = []\n",
+    "for i, url in enumerate(urls):\n",
+    "    try:\n",
+    "        loader = WebBaseLoader(url)\n",
+    "        docs.extend(loader.load())\n",
+    "        if (i + 1) % 10 == 0:\n",
+    "            print(f\"Loaded {i + 1}/{len(urls)} URLs\")\n",
+    "    except Exception as e:\n",
+    "        print(f\"Failed to load {url}: {e}\")\n",
+    "\n",
+    "print(f\"\\nTotal documents loaded: {len(docs)}\")\n",
+    "'''\n",
+    "# --- Method 2: Load documents locally from the system (commented out) ---\n",
+    "\n",
+    "import os\n",
+    "from langchain.document_loaders import (\n",
+    "    TextLoader,\n",
+    "    PyPDFLoader,\n",
+    "    DirectoryLoader,\n",
+    ")\n",
+    "from langchain.schema import Document as LCDocument\n",
+    "from typing import List\n",
+    "\n",
+    "class LocalDocumentLoader:\n",
+    "    \"\"\"Load documents from a local directory using LangChain loaders.\"\"\"\n",
+    "    def __init__(self, directory_path: str):\n",
+    "        self.directory_path = directory_path\n",
+    "\n",
+    "    def load(self) -> List[LCDocument]:\n",
+    "        \"\"\"Loads all supported documents from the directory.\"\"\"\n",
+    "        if not self.directory_path:\n",
+    "            raise ValueError(\"Directory path not set.\")\n",
+    "\n",
+    "        # Define loaders for different file types\n",
+    "        txt_loader = DirectoryLoader(\n",
+    "            self.directory_path, glob=\"**/*.txt\", loader_cls=TextLoader,\n",
+    "            loader_kwargs={\"encoding\": \"utf-8\"}, show_progress=True\n",
+    "        )\n",
+    "        pdf_loader = DirectoryLoader(\n",
+    "            self.directory_path, glob=\"**/*.pdf\", loader_cls=PyPDFLoader, show_progress=True\n",
+    "        )\n",
+    "\n",
+    "        documents = []\n",
+    "        documents.extend(txt_loader.load())\n",
+    "        documents.extend(pdf_loader.load())\n",
+    "        \n",
+    "        return documents\n",
+    "\n",
+    "#Usage Example:\n",
+    "loader = LocalDocumentLoader(directory_path=\"content\")\n",
+    "docs = loader.load()\n",
+    "print(f\"Loaded {len(docs)} local documents.\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "a6b107e7",
+   "metadata": {},
+   "source": [
+    "### Split Documents into Chunks\n",
+    "\n",
+    "LLMs have a limited context window, so we need to split large documents into smaller chunks. This ensures that the model can process the retrieved information effectively. We use `RecursiveCharacterTextSplitter` which is a smart way to split text while trying to keep related content together."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 5,
+   "id": "51d07ec4-b929-4893-baff-af68a4fbf3aa",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Documents split into 28 chunks.\n"
+     ]
+    }
+   ],
+   "source": [
+    "from langchain.text_splitter import RecursiveCharacterTextSplitter\n",
+    "\n",
+    "# Split the documents into smaller chunks with a specified size and overlap\n",
+    "text_splitter = RecursiveCharacterTextSplitter(\n",
+    "    chunk_size=1250,\n",
+    "    chunk_overlap=100,\n",
+    "    length_function=len,\n",
+    "    is_separator_regex=False\n",
+    ")\n",
+    "\n",
+    "split_docs = text_splitter.split_documents(docs)\n",
+    "print(f\"Documents split into {len(split_docs)} chunks.\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "1a734d8c",
+   "metadata": {},
+   "source": [
+    "## 5. Vector Store and Retriever Setup\n",
+    "\n",
+    "Now we'll create a vector store to house the document embeddings and enable efficient similarity searches.\n",
+    "\n",
+    "- **`Chroma`**: We use ChromaDB as our vector store. It's a lightweight and easy-to-use vector database.\n",
+    "- **`persist_directory`**: This saves the created database to disk, allowing us to reuse it later without re-processing the documents."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 8,
+   "id": "194acf26-f710-483d-97d6-57bfff7cfa65",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "C:\\Users\\Local_Admin\\AppData\\Local\\Temp\\ipykernel_4188\\156854944.py:4: LangChainDeprecationWarning: The class `Chroma` was deprecated in LangChain 0.2.9 and will be removed in 1.0. An updated version of the class exists in the :class:`~langchain-chroma package and should be used instead. To use it run `pip install -U :class:`~langchain-chroma` and import as `from :class:`~langchain_chroma import Chroma``.\n",
+      "  vectorstore = Chroma(\n"
+     ]
+    }
+   ],
+   "source": [
+    "# Create a ChromaDB instance to store the document embeddings\n",
+    "from langchain_community.vectorstores import Chroma\n",
+    "\n",
+    "vectorstore = Chroma(\n",
+    "    embedding_function=embedding,\n",
+    "    persist_directory=\"./chromadb_varsity\",\n",
+    "    collection_name=\"zerodha_varsity_docs\"\n",
+    ")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "0690e134",
+   "metadata": {},
+   "source": [
+    "### Add Documents to the Vector Store\n",
+    "\n",
+    "We add the processed document chunks to the vector store. To handle a large number of documents efficiently, we add them in batches. The metadata is also filtered to ensure compatibility with the vector store."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 9,
+   "id": "08f6ce04-9702-4372-be96-6fc34431fc21",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Added batch 1/1\n"
+     ]
+    },
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "C:\\Users\\Local_Admin\\AppData\\Local\\Temp\\ipykernel_4188\\1265604352.py:12: LangChainDeprecationWarning: Since Chroma 0.4.x the manual persistence method is no longer supported as docs are automatically persisted.\n",
+      "  vectorstore.persist()\n"
+     ]
+    }
+   ],
+   "source": [
+    "from langchain_community.vectorstores.utils import filter_complex_metadata\n",
+    "\n",
+    "# Function to insert embeddings in batches for a lengthy document set\n",
+    "def add_documents_in_batches(vectorstore, docs, batch_size=100):\n",
+    "    \"\"\"Adds documents to the vectorstore in batches.\"\"\"\n",
+    "    for i in range(0, len(docs), batch_size):\n",
+    "        chunk = docs[i : i + batch_size]\n",
+    "        vectorstore.add_documents(chunk)\n",
+    "        print(f\"Added batch {i//batch_size + 1}/{(len(docs)-1)//batch_size + 1}\")\n",
+    "    # Persist the database to disk if the method is available\n",
+    "    if hasattr(vectorstore, \"persist\"):\n",
+    "        vectorstore.persist()\n",
+    "\n",
+    "# Filter out complex metadata that might cause issues\n",
+    "filtered_docs = filter_complex_metadata(split_docs)\n",
+    "\n",
+    "# Add the documents to the vector store in batches\n",
+    "add_documents_in_batches(vectorstore, filtered_docs)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "4af0346d",
+   "metadata": {},
+   "source": [
+    "### Set up a Reranking Retriever\n",
+    "\n",
+    "To improve the quality of retrieved documents, we use a reranker. The initial retriever fetches a set of documents (e.g., k=5), and the reranker (`FlashrankRerank`) re-orders them based on their relevance to the query. This ensures that the most relevant context is passed to the LLM.\n",
+    "\n",
+    "- **`ContextualCompressionRetriever`**: Wraps a base retriever and a document compressor (the reranker) to create this two-stage retrieval process."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 10,
+   "id": "01bc1634-dcea-431c-b447-af5b7d38aaeb",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from langchain.retrievers import ContextualCompressionRetriever\n",
+    "from langchain.retrievers.document_compressors import FlashrankRerank\n",
+    "\n",
+    "# Set up the base retriever to fetch the top 5 documents\n",
+    "retriever = vectorstore.as_retriever(search_kwargs={\"k\": 5})\n",
+    "\n",
+    "# Initialize the reranker\n",
+    "compressor = FlashrankRerank()\n",
+    "\n",
+    "# Create the compression retriever, which combines retrieval and reranking\n",
+    "compression_retriever = ContextualCompressionRetriever(\n",
+    "    base_compressor=compressor,\n",
+    "    base_retriever=retriever\n",
+    ")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "2c9bd8c5",
+   "metadata": {},
+   "source": [
+    "## 6. Building the RAG Chain\n",
+    "\n",
+    "With all the components ready, we now assemble the final RAG pipeline using LangChain's `RetrievalQA` chain. This chain connects the LLM with the retriever.\n",
+    "\n",
+    "- **`chain_type=\"stuff\"`**: This means all retrieved documents will be \"stuffed\" into the prompt sent to the LLM.\n",
+    "- **`return_source_documents=True`**: This is important for evaluation, as it allows us to see which documents were used to generate the answer."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 11,
+   "id": "e18d5187-a6c6-406f-b5b2-f9982d97d3a2",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from langchain.chains import RetrievalQA\n",
+    "\n",
+    "qa_chain = RetrievalQA.from_chain_type(\n",
+    "    llm=llm,\n",
+    "    chain_type=\"stuff\",\n",
+    "    retriever=compression_retriever,\n",
+    "    return_source_documents=True\n",
+    ")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "db9453a2",
+   "metadata": {},
+   "source": [
+    "## 7. Running the RAG Pipeline\n",
+    "\n",
+    "It's time to ask a question! The `qa_chain.invoke` method will execute the full RAG process: retrieve relevant documents, pass them to the LLM along with the question, and return the final answer."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 12,
+   "id": "fc8def11-554b-4bd1-ab37-9824f003966e",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "--- Question ---\n",
+      "What is deep link?\n",
+      "\n",
+      "--- Answer ---\n",
+      "Use the following pieces of context to answer the question at the end. If you don't know the answer, just say that you don't know, don't try to make up an answer.\n",
+      "\n",
+      "Introduction\n",
+      "With the release of the Intel® 11th Generation mobile processor and the \n",
+      "Intel® Iris® Xe and Intel® Iris® Xe MAX graphics architecture, Deep Link was \n",
+      "introduced to the world and a new era of innovation was born.\n",
+      "Developers now have the ability to strategically apply computing power \n",
+      "that was previously unavailable, and to assign tasks to parts of the machine \n",
+      "which would otherwise just lie dormant. Imagine having the ability to \n",
+      "significantly boost the performance of your application using not much \n",
+      "more than a strategic approach and some lines of code.  \n",
+      "That is the power of Deep Link.\n",
+      "Table of Contents\n",
+      "Introduction  .................... 1\n",
+      "Deep Link Technology  ........... 2\n",
+      "OpenVINO  ...................... 2\n",
+      "Use Case: Topaz Gigapixel AI  ..... 3\n",
+      "Use Case: AI-powered Video  ..... 4\n",
+      "Getting Started  .................. 5\n",
+      "Platform Considerations  ......... 7\n",
+      "Online Resources  ................ 8\n",
+      "Conclusion ...................... 8\n",
+      "This paper is part one in a series of white papers designed to provide details \n",
+      "regarding openly available development tools that can be used to take full \n",
+      "advantage of Intel® Deep Link Technology.\n",
+      "Authors\n",
+      "Roman Borisov \n",
+      "Senior Software Application Engineer \n",
+      "Max Domeika\n",
+      "Principal Engineer\n",
+      "\n",
+      "developers can take full advantage of Deep Link Technology and build a solution that efficiently utilizes all of the computing \n",
+      "power available to the system without favoring or ignoring any components - whether integrated or discrete.\n",
+      "One of those tools is the OpenVINO™ Toolkit.\n",
+      "The OpenVINO™ Toolkit was developed by Intel® and released to the open source community in May of 2018. Two versions of \n",
+      "the toolkit are available today; one which is fully open source and a second which is distributed and supported by Intel®.\n",
+      "The toolkit was designed to help developers save time, energy and resources by providing the tools to build deep learning and \n",
+      "Artificial Intelligence (AI)-driven applications that not only perform much better, but are also easier to create. It does this in a \n",
+      "number of different ways, including:\n",
+      "Heterogeneous Execution (write once - deploy anywhere) \n",
+      "OpenVINO™ allows developers to write a segment of code one time, and then deploy a device-specific iteration of the same \n",
+      "code across multiple computational components (CPU, iGPU, dGPU, Vision Processing Unit, FPGA, etc.).\n",
+      "Intuitive Workflow\n",
+      "The workflow that it utilizes was built from the ground up to allow OpenVINO™ to be an efficient and comprehensive\n",
+      "\n",
+      "process, along with pre- and post-processing and CODEC stages. This pipeline shows a common process involving style \n",
+      "transfer of the original video clip, followed by an upscaling operation used to improve the image quality of the newly stylized \n",
+      "clip. This process is typical of one which might be found in a video editing application.\n",
+      "White Paper | Unlocking the Power of Intel ® Deep Link   Part One: Client Artificial Intelligence Using Intel ® GPUs\n",
+      "Table 1. Device utilization statistics for multi-device test implementation.\n",
+      "* - The theoretical maximum Frames-per-Second (FPS) performance for Deep Link in this configuration would be the sum of the \n",
+      "performance of each individual GPU (in this case 8.7 + 6.9 = 15.6 FPS), but it is clear from the table that when two GPUs are engaged \n",
+      "simultaneously both GPUs are under-utilized.\n",
+      "Changing the workload partitioning and using the GPU(s) to perform all of the pre- and post-processing (using CV::UMat abstraction) would \n",
+      "increase efficiency and increase the FPS rate substantially. In addition, decoding directly to video memory and encoding directly from video \n",
+      "memory to avoid unnecessary data transfer would increase efficiency, but this is not currently supported by OpenCV. Intel™ Media SDK\n",
+      "\n",
+      "Question: What is deep link?\n",
+      "Helpful Answer:\n",
+      "\n",
+      "Deep Link is a technology introduced by Intel that allows developers to strategically apply computing power to parts of a machine that would otherwise be underutilized. It enables the boosting of application performance through a strategic approach and some lines of code. Deep Link technology, along with the OpenVINO Toolkit, helps developers build AI-driven applications that perform better and are easier to create. It achieves this by allowing for heterogeneous execution (write once, deploy anywhere\n"
+     ]
+    }
+   ],
+   "source": [
+    "question = \"What is deep link?\"\n",
+    "result = qa_chain.invoke({\"query\": question})\n",
+    "print(\"--- Question ---\")\n",
+    "print(question)\n",
+    "print(\"\\n--- Answer ---\")\n",
+    "print(result[\"result\"])"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "b30815a2",
+   "metadata": {},
+   "source": [
+    "### Extract Answer and Context for Evaluation\n",
+    "\n",
+    "For the evaluation step, we need to isolate the generated answer and the source documents (the context or \"reference\")."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 13,
+   "id": "991a94fc-b7b3-4709-896b-c613e1b857b8",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "answer = result['result']\n",
+    "context = \" \".join([d.page_content for d in result['source_documents']])"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "bf65db80",
+   "metadata": {},
+   "source": [
+    "## 8. Evaluation\n",
+    "\n",
+    "To assess the quality of our RAG pipeline, we use a custom `OpenVINORAGEvaluator` class. This class uses OpenVINO-optimized models to calculate several key metrics:\n",
+    "\n",
+    "- **BLEU & ROUGE**: Measure the overlap between the generated answer and the reference context.\n",
+    "- **BERTScore**: Computes semantic similarity, which is more advanced than simple overlap.\n",
+    "- **Perplexity**: Measures how well a language model (here, Llama-2-7B) predicts the generated text. Lower is better.\n",
+    "- **Diversity**: Calculates the variety of tokens in the response.\n",
+    "- **Racial Bias**: Uses a hate speech detection model to check for biased content.\n",
+    "\n",
+    "**Note**: The first time you run this, it will download and convert the necessary evaluation models (Llama-2-7B and a hate speech model) to the OpenVINO format. This is a one-time setup."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 14,
+   "id": "1a8122e4-6602-4750-ad6e-c5cc599e0b0a",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import openvino as ov\n",
+    "import numpy as np\n",
+    "import torch\n",
+    "from transformers import AutoTokenizer, AutoModelForCausalLM, AutoModelForSequenceClassification\n",
+    "from optimum.intel import OVModelForCausalLM, OVModelForSequenceClassification\n",
+    "from nltk.translate.bleu_score import corpus_bleu, SmoothingFunction\n",
+    "from rouge_score import rouge_scorer\n",
+    "from bert_score import score\n",
+    "from nltk.util import ngrams\n",
+    "from typing import List\n",
+    "import os\n",
+    "\n",
+    "class OpenVINORAGEvaluator:\n",
+    "    \"\"\"An evaluator for RAG pipelines using OpenVINO-optimized models.\"\"\"\n",
+    "    \n",
+    "    def __init__(self, device=\"GPU\", models_dir=\"./openvino_models\"):\n",
+    "        self.device = device\n",
+    "        self.models_dir = models_dir\n",
+    "        os.makedirs(self.models_dir, exist_ok=True)\n",
+    "        \n",
+    "        # Initialize models and tokenizers for evaluation\n",
+    "        self.llama2_model, self.llama2_tokenizer = self._load_model(\n",
+    "            model_id=\"meta-llama/Llama-2-7b-hf\",\n",
+    "            ov_model_class=OVModelForCausalLM,\n",
+    "            subfolder=\"llama2-7b-openvino\"\n",
+    "        )\n",
+    "        self.bias_model, self.bias_tokenizer = self._load_model(\n",
+    "            model_id=\"Hate-speech-CNERG/dehatebert-mono-english\",\n",
+    "            ov_model_class=OVModelForSequenceClassification,\n",
+    "            subfolder=\"hate-speech-openvino\"\n",
+    "        )\n",
+    "        print(f\"OpenVINO RAG Evaluator initialized on {device}\")\n",
+    "\n",
+    "    def _load_model(self, model_id, ov_model_class, subfolder):\n",
+    "        \"\"\"Generic function to load or convert a model to OpenVINO format.\"\"\"\n",
+    "        model_path = os.path.join(self.models_dir, subfolder)\n",
+    "        \n",
+    "        if not os.path.exists(os.path.join(model_path, \"openvino_model.xml\")):\n",
+    "            print(f\"Converting {model_id} to OpenVINO format...\")\n",
+    "            ov_model = ov_model_class.from_pretrained(model_id, export=True, compile=False)\n",
+    "            ov_model.save_pretrained(model_path)\n",
+    "            print(f\"Model saved to {model_path}\")\n",
+    "        \n",
+    "        try:\n",
+    "            print(f\"Loading {model_id} from {model_path}...\")\n",
+    "            model = ov_model_class.from_pretrained(model_path, device=self.device)\n",
+    "            tokenizer = AutoTokenizer.from_pretrained(model_id)\n",
+    "            print(f\"{model_id} loaded successfully.\")\n",
+    "            return model, tokenizer\n",
+    "        except Exception as e:\n",
+    "            print(f\"Error loading {model_id}: {e}\")\n",
+    "            return None, None\n",
+    "\n",
+    "    def evaluate_bleu_rouge(self, candidates: List[str], references: List[str]):\n",
+    "        \"\"\"Calculates BLEU and ROUGE scores.\"\"\"\n",
+    "        candidate_tokens = [c.split() for c in candidates]\n",
+    "        reference_tokens = [[r.split()] for r in references]\n",
+    "        \n",
+    "        # BLEU with smoothing\n",
+    "        smoothing = SmoothingFunction().method1\n",
+    "        bleu_score = corpus_bleu(reference_tokens, candidate_tokens, smoothing_function=smoothing)\n",
+    "        \n",
+    "        # ROUGE\n",
+    "        scorer = rouge_scorer.RougeScorer(['rouge1', 'rougeL'], use_stemmer=True)\n",
+    "        rouge1_f1 = sum(scorer.score(ref, cand)['rouge1'].fmeasure for ref, cand in zip(references, candidates)) / len(candidates)\n",
+    "        return bleu_score, rouge1_f1\n",
+    "\n",
+    "    def evaluate_bert_score(self, candidates: List[str], references: List[str]):\n",
+    "        \"\"\"Calculates BERTScore.\"\"\"\n",
+    "        _, _, f1 = score(candidates, references, lang=\"en\", model_type='bert-base-multilingual-cased')\n",
+    "        return f1.mean().item()\n",
+    "\n",
+    "    def evaluate_perplexity(self, text: str):\n",
+    "        \"\"\"Calculates perplexity using the loaded Llama-2 model.\"\"\"\n",
+    "        if not self.llama2_model:\n",
+    "            return float('inf')\n",
+    "        \n",
+    "        try:\n",
+    "            encodings = self.llama2_tokenizer(text, return_tensors='pt', max_length=1024, truncation=True)\n",
+    "            input_ids = encodings.input_ids\n",
+    "            \n",
+    "            with torch.no_grad():\n",
+    "                outputs = self.llama2_model(input_ids)\n",
+    "                logits = outputs.logits\n",
+    "                \n",
+    "                # Manually calculate cross-entropy loss\n",
+    "                # Shift logits and labels for next-token prediction\n",
+    "                shift_logits = logits[..., :-1, :].contiguous()\n",
+    "                shift_labels = input_ids[..., 1:].contiguous()\n",
+    "                \n",
+    "                # Calculate loss\n",
+    "                loss_fct = torch.nn.CrossEntropyLoss()\n",
+    "                loss = loss_fct(shift_logits.view(-1, shift_logits.size(-1)), shift_labels.view(-1))\n",
+    "                perplexity = torch.exp(loss)\n",
+    "            \n",
+    "            return perplexity.item()\n",
+    "        except Exception as e:\n",
+    "            print(f\"Error calculating perplexity: {e}\")\n",
+    "            return float('inf')\n",
+    "\n",
+    "    def evaluate_racial_bias(self, text: str):\n",
+    "        \"\"\"Evaluates racial bias using a hate speech detection model.\"\"\"\n",
+    "        if not self.bias_model:\n",
+    "            return 0.0\n",
+    "\n",
+    "        try:\n",
+    "            inputs = self.bias_tokenizer(text, return_tensors=\"pt\", truncation=True, max_length=512)\n",
+    "            with torch.no_grad():\n",
+    "                logits = self.bias_model(**inputs).logits\n",
+    "                probabilities = torch.nn.functional.softmax(logits, dim=-1)\n",
+    "                # Return the probability of the 'hate speech' class (index 1)\n",
+    "                bias_score = probabilities[0][1].item()\n",
+    "            return bias_score\n",
+    "        except Exception as e:\n",
+    "            print(f\"Error calculating bias: {e}\")\n",
+    "            return 0.0\n",
+    "    \n",
+    "    def evaluate_all(self, response: str, reference: str):\n",
+    "        \"\"\"Runs a comprehensive evaluation and returns all metrics.\"\"\"\n",
+    "        candidates = [response]\n",
+    "        references = [reference]\n",
+    "        \n",
+    "        try:\n",
+    "            bleu, rouge1 = self.evaluate_bleu_rouge(candidates, references)\n",
+    "            bert_f1 = self.evaluate_bert_score(candidates, references)\n",
+    "            perplexity = self.evaluate_perplexity(response)\n",
+    "            racial_bias = self.evaluate_racial_bias(response)\n",
+    "            \n",
+    "            return {\n",
+    "                \"BLEU\": bleu,\n",
+    "                \"ROUGE-1\": rouge1,\n",
+    "                \"BERT F1\": bert_f1,\n",
+    "                \"Perplexity\": perplexity,\n",
+    "                \"Racial Bias\": racial_bias\n",
+    "            }\n",
+    "        except Exception as e:\n",
+    "            print(f\"An error occurred during evaluation: {e}\")\n",
+    "            return {k: 0.0 for k in [\"BLEU\", \"ROUGE-1\", \"BERT F1\", \"Perplexity\", \"Racial Bias\"]}"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "e9c19402",
+   "metadata": {},
+   "source": [
+    "### Run the Evaluation\n",
+    "\n",
+    "Finally, we initialize the `OpenVINORAGEvaluator` and call `evaluate_all` to get a dictionary of scores. This provides a quantitative look at the performance of our RAG pipeline for the given query."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 16,
+   "id": "3dedb93b-20e3-43c3-93a1-38cb3e114019",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Converting meta-llama/Llama-2-7b-hf to OpenVINO format...\n"
+     ]
+    },
+    {
+     "data": {
+      "application/vnd.jupyter.widget-view+json": {
+       "model_id": "363201c665cc483da02707818ef1fcf3",
+       "version_major": 2,
+       "version_minor": 0
+      },
+      "text/plain": [
+       "Loading checkpoint shards:   0%|          | 0/2 [00:00\n"
+      ],
+      "text/plain": []
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    },
+    {
+     "ename": "RuntimeError",
+     "evalue": "Exception from src\\inference\\src\\cpp\\core.cpp:97:\nCheck 'false' failed at src\\frontends\\common\\src\\frontend.cpp:54:\nConverting input model\nstoll argument out of range\n\n",
+     "output_type": "error",
+     "traceback": [
+      "\u001b[1;31m---------------------------------------------------------------------------\u001b[0m",
+      "\u001b[1;31mRuntimeError\u001b[0m                              Traceback (most recent call last)",
+      "Cell \u001b[1;32mIn[16], line 2\u001b[0m\n\u001b[0;32m      1\u001b[0m \u001b[38;5;66;03m# Initialize the evaluator (this might take a moment on the first run)\u001b[39;00m\n\u001b[1;32m----> 2\u001b[0m evaluator \u001b[38;5;241m=\u001b[39m \u001b[43mOpenVINORAGEvaluator\u001b[49m\u001b[43m(\u001b[49m\u001b[43mdevice\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[38;5;124;43m\"\u001b[39;49m\u001b[38;5;124;43mGPU\u001b[39;49m\u001b[38;5;124;43m\"\u001b[39;49m\u001b[43m)\u001b[49m\n\u001b[0;32m      4\u001b[0m \u001b[38;5;66;03m# Prepare the data for evaluation\u001b[39;00m\n\u001b[0;32m      5\u001b[0m response_text \u001b[38;5;241m=\u001b[39m answer\n",
+      "Cell \u001b[1;32mIn[14], line 22\u001b[0m, in \u001b[0;36mOpenVINORAGEvaluator.__init__\u001b[1;34m(self, device, models_dir)\u001b[0m\n\u001b[0;32m     19\u001b[0m os\u001b[38;5;241m.\u001b[39mmakedirs(\u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39mmodels_dir, exist_ok\u001b[38;5;241m=\u001b[39m\u001b[38;5;28;01mTrue\u001b[39;00m)\n\u001b[0;32m     21\u001b[0m \u001b[38;5;66;03m# Initialize models and tokenizers for evaluation\u001b[39;00m\n\u001b[1;32m---> 22\u001b[0m \u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39mllama2_model, \u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39mllama2_tokenizer \u001b[38;5;241m=\u001b[39m \u001b[38;5;28;43mself\u001b[39;49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43m_load_model\u001b[49m\u001b[43m(\u001b[49m\n\u001b[0;32m     23\u001b[0m \u001b[43m    \u001b[49m\u001b[43mmodel_id\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[38;5;124;43m\"\u001b[39;49m\u001b[38;5;124;43mmeta-llama/Llama-2-7b-hf\u001b[39;49m\u001b[38;5;124;43m\"\u001b[39;49m\u001b[43m,\u001b[49m\n\u001b[0;32m     24\u001b[0m \u001b[43m    \u001b[49m\u001b[43mov_model_class\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mOVModelForCausalLM\u001b[49m\u001b[43m,\u001b[49m\n\u001b[0;32m     25\u001b[0m \u001b[43m    \u001b[49m\u001b[43msubfolder\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[38;5;124;43m\"\u001b[39;49m\u001b[38;5;124;43mllama2-7b-openvino\u001b[39;49m\u001b[38;5;124;43m\"\u001b[39;49m\n\u001b[0;32m     26\u001b[0m \u001b[43m\u001b[49m\u001b[43m)\u001b[49m\n\u001b[0;32m     27\u001b[0m \u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39mbias_model, \u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39mbias_tokenizer \u001b[38;5;241m=\u001b[39m \u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39m_load_model(\n\u001b[0;32m     28\u001b[0m     model_id\u001b[38;5;241m=\u001b[39m\u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mHate-speech-CNERG/dehatebert-mono-english\u001b[39m\u001b[38;5;124m\"\u001b[39m,\n\u001b[0;32m     29\u001b[0m     ov_model_class\u001b[38;5;241m=\u001b[39mOVModelForSequenceClassification,\n\u001b[0;32m     30\u001b[0m     subfolder\u001b[38;5;241m=\u001b[39m\u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mhate-speech-openvino\u001b[39m\u001b[38;5;124m\"\u001b[39m\n\u001b[0;32m     31\u001b[0m )\n\u001b[0;32m     32\u001b[0m \u001b[38;5;28mprint\u001b[39m(\u001b[38;5;124mf\u001b[39m\u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mOpenVINO RAG Evaluator initialized on \u001b[39m\u001b[38;5;132;01m{\u001b[39;00mdevice\u001b[38;5;132;01m}\u001b[39;00m\u001b[38;5;124m\"\u001b[39m)\n",
+      "Cell \u001b[1;32mIn[14], line 40\u001b[0m, in \u001b[0;36mOpenVINORAGEvaluator._load_model\u001b[1;34m(self, model_id, ov_model_class, subfolder)\u001b[0m\n\u001b[0;32m     38\u001b[0m \u001b[38;5;28;01mif\u001b[39;00m \u001b[38;5;129;01mnot\u001b[39;00m os\u001b[38;5;241m.\u001b[39mpath\u001b[38;5;241m.\u001b[39mexists(os\u001b[38;5;241m.\u001b[39mpath\u001b[38;5;241m.\u001b[39mjoin(model_path, \u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mopenvino_model.xml\u001b[39m\u001b[38;5;124m\"\u001b[39m)):\n\u001b[0;32m     39\u001b[0m     \u001b[38;5;28mprint\u001b[39m(\u001b[38;5;124mf\u001b[39m\u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mConverting \u001b[39m\u001b[38;5;132;01m{\u001b[39;00mmodel_id\u001b[38;5;132;01m}\u001b[39;00m\u001b[38;5;124m to OpenVINO format...\u001b[39m\u001b[38;5;124m\"\u001b[39m)\n\u001b[1;32m---> 40\u001b[0m     ov_model \u001b[38;5;241m=\u001b[39m \u001b[43mov_model_class\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mfrom_pretrained\u001b[49m\u001b[43m(\u001b[49m\u001b[43mmodel_id\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43mexport\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[38;5;28;43;01mTrue\u001b[39;49;00m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[38;5;28;43mcompile\u001b[39;49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[38;5;28;43;01mFalse\u001b[39;49;00m\u001b[43m)\u001b[49m\n\u001b[0;32m     41\u001b[0m     ov_model\u001b[38;5;241m.\u001b[39msave_pretrained(model_path)\n\u001b[0;32m     42\u001b[0m     \u001b[38;5;28mprint\u001b[39m(\u001b[38;5;124mf\u001b[39m\u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mModel saved to \u001b[39m\u001b[38;5;132;01m{\u001b[39;00mmodel_path\u001b[38;5;132;01m}\u001b[39;00m\u001b[38;5;124m\"\u001b[39m)\n",
+      "File \u001b[1;32m~\\ovraglangchain\\lib\\site-packages\\optimum\\intel\\openvino\\modeling_base.py:583\u001b[0m, in \u001b[0;36mOVBaseModel.from_pretrained\u001b[1;34m(cls, model_id, export, force_download, use_auth_token, token, cache_dir, subfolder, config, local_files_only, trust_remote_code, revision, **kwargs)\u001b[0m\n\u001b[0;32m    578\u001b[0m \u001b[38;5;28;01mexcept\u001b[39;00m \u001b[38;5;167;01mException\u001b[39;00m \u001b[38;5;28;01mas\u001b[39;00m exception:\n\u001b[0;32m    579\u001b[0m     logger\u001b[38;5;241m.\u001b[39mwarning(\n\u001b[0;32m    580\u001b[0m         \u001b[38;5;124mf\u001b[39m\u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mCould not infer whether the model was already converted or not to the OpenVINO IR, keeping `export=\u001b[39m\u001b[38;5;132;01m{\u001b[39;00mexport\u001b[38;5;132;01m}\u001b[39;00m\u001b[38;5;124m`.\u001b[39m\u001b[38;5;130;01m\\n\u001b[39;00m\u001b[38;5;132;01m{\u001b[39;00mexception\u001b[38;5;132;01m}\u001b[39;00m\u001b[38;5;124m\"\u001b[39m\n\u001b[0;32m    581\u001b[0m     )\n\u001b[1;32m--> 583\u001b[0m \u001b[38;5;28;01mreturn\u001b[39;00m \u001b[38;5;28msuper\u001b[39m()\u001b[38;5;241m.\u001b[39mfrom_pretrained(\n\u001b[0;32m    584\u001b[0m     model_id,\n\u001b[0;32m    585\u001b[0m     export\u001b[38;5;241m=\u001b[39m_export,\n\u001b[0;32m    586\u001b[0m     force_download\u001b[38;5;241m=\u001b[39mforce_download,\n\u001b[0;32m    587\u001b[0m     token\u001b[38;5;241m=\u001b[39mtoken,\n\u001b[0;32m    588\u001b[0m     cache_dir\u001b[38;5;241m=\u001b[39mcache_dir,\n\u001b[0;32m    589\u001b[0m     subfolder\u001b[38;5;241m=\u001b[39msubfolder,\n\u001b[0;32m    590\u001b[0m     config\u001b[38;5;241m=\u001b[39mconfig,\n\u001b[0;32m    591\u001b[0m     local_files_only\u001b[38;5;241m=\u001b[39mlocal_files_only,\n\u001b[0;32m    592\u001b[0m     trust_remote_code\u001b[38;5;241m=\u001b[39mtrust_remote_code,\n\u001b[0;32m    593\u001b[0m     revision\u001b[38;5;241m=\u001b[39mrevision,\n\u001b[0;32m    594\u001b[0m     \u001b[38;5;241m*\u001b[39m\u001b[38;5;241m*\u001b[39mkwargs,\n\u001b[0;32m    595\u001b[0m )\n",
+      "File \u001b[1;32m~\\ovraglangchain\\lib\\site-packages\\optimum\\modeling_base.py:407\u001b[0m, in \u001b[0;36mOptimizedModel.from_pretrained\u001b[1;34m(cls, model_id, config, export, subfolder, revision, force_download, local_files_only, trust_remote_code, cache_dir, token, **kwargs)\u001b[0m\n\u001b[0;32m    404\u001b[0m \u001b[38;5;28;01melse\u001b[39;00m:\n\u001b[0;32m    405\u001b[0m     from_pretrained_method \u001b[38;5;241m=\u001b[39m \u001b[38;5;28mcls\u001b[39m\u001b[38;5;241m.\u001b[39m_from_pretrained\n\u001b[1;32m--> 407\u001b[0m \u001b[38;5;28;01mreturn\u001b[39;00m from_pretrained_method(\n\u001b[0;32m    408\u001b[0m     model_id\u001b[38;5;241m=\u001b[39mmodel_id,\n\u001b[0;32m    409\u001b[0m     config\u001b[38;5;241m=\u001b[39mconfig,\n\u001b[0;32m    410\u001b[0m     \u001b[38;5;66;03m# hub options\u001b[39;00m\n\u001b[0;32m    411\u001b[0m     revision\u001b[38;5;241m=\u001b[39mrevision,\n\u001b[0;32m    412\u001b[0m     cache_dir\u001b[38;5;241m=\u001b[39mcache_dir,\n\u001b[0;32m    413\u001b[0m     force_download\u001b[38;5;241m=\u001b[39mforce_download,\n\u001b[0;32m    414\u001b[0m     token\u001b[38;5;241m=\u001b[39mtoken,\n\u001b[0;32m    415\u001b[0m     subfolder\u001b[38;5;241m=\u001b[39msubfolder,\n\u001b[0;32m    416\u001b[0m     local_files_only\u001b[38;5;241m=\u001b[39mlocal_files_only,\n\u001b[0;32m    417\u001b[0m     trust_remote_code\u001b[38;5;241m=\u001b[39mtrust_remote_code,\n\u001b[0;32m    418\u001b[0m     \u001b[38;5;241m*\u001b[39m\u001b[38;5;241m*\u001b[39mkwargs,\n\u001b[0;32m    419\u001b[0m )\n",
+      "File \u001b[1;32m~\\ovraglangchain\\lib\\site-packages\\optimum\\intel\\openvino\\modeling_decoder.py:363\u001b[0m, in \u001b[0;36mOVBaseDecoderModel._export\u001b[1;34m(cls, model_id, config, token, revision, force_download, cache_dir, subfolder, local_files_only, task, use_cache, trust_remote_code, load_in_8bit, quantization_config, **kwargs)\u001b[0m\n\u001b[0;32m    358\u001b[0m \u001b[38;5;28;01mif\u001b[39;00m config\u001b[38;5;241m.\u001b[39mmodel_type \u001b[38;5;241m==\u001b[39m \u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mphi3\u001b[39m\u001b[38;5;124m\"\u001b[39m \u001b[38;5;129;01mand\u001b[39;00m config\u001b[38;5;241m.\u001b[39mmax_position_embeddings \u001b[38;5;241m!=\u001b[39m \u001b[38;5;28mgetattr\u001b[39m(\n\u001b[0;32m    359\u001b[0m     config, \u001b[38;5;124m\"\u001b[39m\u001b[38;5;124moriginal_max_position_embeddings\u001b[39m\u001b[38;5;124m\"\u001b[39m, config\u001b[38;5;241m.\u001b[39mmax_position_embeddings\n\u001b[0;32m    360\u001b[0m ):\n\u001b[0;32m    361\u001b[0m     config\u001b[38;5;241m.\u001b[39mmax_position_embeddings \u001b[38;5;241m=\u001b[39m config\u001b[38;5;241m.\u001b[39moriginal_max_position_embeddings\n\u001b[1;32m--> 363\u001b[0m \u001b[38;5;28;01mreturn\u001b[39;00m \u001b[38;5;28mcls\u001b[39m\u001b[38;5;241m.\u001b[39m_from_pretrained(\n\u001b[0;32m    364\u001b[0m     model_id\u001b[38;5;241m=\u001b[39msave_dir_path,\n\u001b[0;32m    365\u001b[0m     config\u001b[38;5;241m=\u001b[39mconfig,\n\u001b[0;32m    366\u001b[0m     use_cache\u001b[38;5;241m=\u001b[39muse_cache,\n\u001b[0;32m    367\u001b[0m     stateful\u001b[38;5;241m=\u001b[39m\u001b[38;5;28;01mNone\u001b[39;00m,\n\u001b[0;32m    368\u001b[0m     load_in_8bit\u001b[38;5;241m=\u001b[39mload_in_8bit,\n\u001b[0;32m    369\u001b[0m     quantization_config\u001b[38;5;241m=\u001b[39mquantization_config,\n\u001b[0;32m    370\u001b[0m     trust_remote_code\u001b[38;5;241m=\u001b[39mtrust_remote_code,\n\u001b[0;32m    371\u001b[0m     compile_only\u001b[38;5;241m=\u001b[39mcompile_only,\n\u001b[0;32m    372\u001b[0m     \u001b[38;5;241m*\u001b[39m\u001b[38;5;241m*\u001b[39mkwargs,\n\u001b[0;32m    373\u001b[0m )\n",
+      "File \u001b[1;32m~\\ovraglangchain\\lib\\site-packages\\optimum\\intel\\openvino\\modeling_decoder.py:859\u001b[0m, in \u001b[0;36mOVModelForCausalLM._from_pretrained\u001b[1;34m(cls, model_id, config, token, revision, force_download, cache_dir, file_name, subfolder, from_onnx, local_files_only, load_in_8bit, compile_only, quantization_config, trust_remote_code, **kwargs)\u001b[0m\n\u001b[0;32m    847\u001b[0m model_cache_path \u001b[38;5;241m=\u001b[39m \u001b[38;5;28mcls\u001b[39m\u001b[38;5;241m.\u001b[39m_cached_file(\n\u001b[0;32m    848\u001b[0m     model_path\u001b[38;5;241m=\u001b[39mmodel_path,\n\u001b[0;32m    849\u001b[0m     token\u001b[38;5;241m=\u001b[39mtoken,\n\u001b[1;32m   (...)\u001b[0m\n\u001b[0;32m    855\u001b[0m     local_files_only\u001b[38;5;241m=\u001b[39mlocal_files_only,\n\u001b[0;32m    856\u001b[0m )\n\u001b[0;32m    858\u001b[0m \u001b[38;5;28;01mif\u001b[39;00m \u001b[38;5;129;01mnot\u001b[39;00m compile_only:\n\u001b[1;32m--> 859\u001b[0m     model \u001b[38;5;241m=\u001b[39m \u001b[38;5;28;43mcls\u001b[39;49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mload_model\u001b[49m\u001b[43m(\u001b[49m\u001b[43mmodel_cache_path\u001b[49m\u001b[43m)\u001b[49m\n\u001b[0;32m    860\u001b[0m \u001b[38;5;28;01melse\u001b[39;00m:\n\u001b[0;32m    861\u001b[0m     model \u001b[38;5;241m=\u001b[39m \u001b[38;5;28mcls\u001b[39m\u001b[38;5;241m.\u001b[39m_compile_model(\n\u001b[0;32m    862\u001b[0m         model_cache_path, kwargs\u001b[38;5;241m.\u001b[39mget(\u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mdevice\u001b[39m\u001b[38;5;124m\"\u001b[39m, \u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mCPU\u001b[39m\u001b[38;5;124m\"\u001b[39m), kwargs\u001b[38;5;241m.\u001b[39mget(\u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mov_config\u001b[39m\u001b[38;5;124m\"\u001b[39m), model_cache_path\u001b[38;5;241m.\u001b[39mparent\n\u001b[0;32m    863\u001b[0m     )\n",
+      "File \u001b[1;32m~\\ovraglangchain\\lib\\site-packages\\optimum\\intel\\openvino\\modeling_base.py:336\u001b[0m, in \u001b[0;36mOVBaseModel.load_model\u001b[1;34m(file_name, quantization_config)\u001b[0m\n\u001b[0;32m    333\u001b[0m \u001b[38;5;28;01mif\u001b[39;00m \u001b[38;5;28misinstance\u001b[39m(file_name, \u001b[38;5;28mstr\u001b[39m):\n\u001b[0;32m    334\u001b[0m     file_name \u001b[38;5;241m=\u001b[39m Path(file_name)\n\u001b[0;32m    335\u001b[0m model \u001b[38;5;241m=\u001b[39m (\n\u001b[1;32m--> 336\u001b[0m     \u001b[43mcore\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mread_model\u001b[49m\u001b[43m(\u001b[49m\u001b[43mfile_name\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mresolve\u001b[49m\u001b[43m(\u001b[49m\u001b[43m)\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43mfile_name\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mwith_suffix\u001b[49m\u001b[43m(\u001b[49m\u001b[38;5;124;43m\"\u001b[39;49m\u001b[38;5;124;43m.bin\u001b[39;49m\u001b[38;5;124;43m\"\u001b[39;49m\u001b[43m)\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mresolve\u001b[49m\u001b[43m(\u001b[49m\u001b[43m)\u001b[49m\u001b[43m)\u001b[49m\n\u001b[0;32m    337\u001b[0m     \u001b[38;5;28;01mif\u001b[39;00m \u001b[38;5;129;01mnot\u001b[39;00m file_name\u001b[38;5;241m.\u001b[39msuffix \u001b[38;5;241m==\u001b[39m \u001b[38;5;124m\"\u001b[39m\u001b[38;5;124m.onnx\u001b[39m\u001b[38;5;124m\"\u001b[39m\n\u001b[0;32m    338\u001b[0m     \u001b[38;5;28;01melse\u001b[39;00m convert_model(file_name)\n\u001b[0;32m    339\u001b[0m )\n\u001b[0;32m    340\u001b[0m \u001b[38;5;28;01mif\u001b[39;00m file_name\u001b[38;5;241m.\u001b[39msuffix \u001b[38;5;241m==\u001b[39m \u001b[38;5;124m\"\u001b[39m\u001b[38;5;124m.onnx\u001b[39m\u001b[38;5;124m\"\u001b[39m:\n\u001b[0;32m    341\u001b[0m     model \u001b[38;5;241m=\u001b[39m fix_op_names_duplicates(model)  \u001b[38;5;66;03m# should be called during model conversion to IR\u001b[39;00m\n",
+      "File \u001b[1;32m~\\ovraglangchain\\lib\\site-packages\\openvino\\_ov_api.py:603\u001b[0m, in \u001b[0;36mCore.read_model\u001b[1;34m(self, model, weights, config)\u001b[0m\n\u001b[0;32m    601\u001b[0m     \u001b[38;5;28;01mreturn\u001b[39;00m Model(\u001b[38;5;28msuper\u001b[39m()\u001b[38;5;241m.\u001b[39mread_model(model, config\u001b[38;5;241m=\u001b[39mconfig))\n\u001b[0;32m    602\u001b[0m \u001b[38;5;28;01melse\u001b[39;00m:\n\u001b[1;32m--> 603\u001b[0m     \u001b[38;5;28;01mreturn\u001b[39;00m Model(\u001b[38;5;28;43msuper\u001b[39;49m\u001b[43m(\u001b[49m\u001b[43m)\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mread_model\u001b[49m\u001b[43m(\u001b[49m\u001b[43mmodel\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43mweights\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43mconfig\u001b[49m\u001b[43m)\u001b[49m)\n",
+      "\u001b[1;31mRuntimeError\u001b[0m: Exception from src\\inference\\src\\cpp\\core.cpp:97:\nCheck 'false' failed at src\\frontends\\common\\src\\frontend.cpp:54:\nConverting input model\nstoll argument out of range\n\n"
+     ]
+    }
+   ],
+   "source": [
+    "# Initialize the evaluator (this might take a moment on the first run)\n",
+    "evaluator = OpenVINORAGEvaluator(device=\"GPU\")\n",
+    "\n",
+    "# Prepare the data for evaluation\n",
+    "response_text = answer\n",
+    "reference_text = context\n",
+    "\n",
+    "# Get all evaluation metrics\n",
+    "metrics = evaluator.evaluate_all(response_text, reference_text)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 17,
+   "id": "8006c91c-1180-41c8-b04e-448e4131391f",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "--- Evaluation Metrics ---\n"
+     ]
+    },
+    {
+     "ename": "NameError",
+     "evalue": "name 'metrics' is not defined",
+     "output_type": "error",
+     "traceback": [
+      "\u001b[1;31m---------------------------------------------------------------------------\u001b[0m",
+      "\u001b[1;31mNameError\u001b[0m                                 Traceback (most recent call last)",
+      "Cell \u001b[1;32mIn[17], line 2\u001b[0m\n\u001b[0;32m      1\u001b[0m \u001b[38;5;28mprint\u001b[39m(\u001b[38;5;124m\"\u001b[39m\u001b[38;5;124m--- Evaluation Metrics ---\u001b[39m\u001b[38;5;124m\"\u001b[39m)\n\u001b[1;32m----> 2\u001b[0m \u001b[38;5;28;01mfor\u001b[39;00m metric, value \u001b[38;5;129;01min\u001b[39;00m \u001b[43mmetrics\u001b[49m\u001b[38;5;241m.\u001b[39mitems():\n\u001b[0;32m      3\u001b[0m     \u001b[38;5;28mprint\u001b[39m(\u001b[38;5;124mf\u001b[39m\u001b[38;5;124m\"\u001b[39m\u001b[38;5;132;01m{\u001b[39;00mmetric\u001b[38;5;132;01m}\u001b[39;00m\u001b[38;5;124m: \u001b[39m\u001b[38;5;132;01m{\u001b[39;00mvalue\u001b[38;5;132;01m:\u001b[39;00m\u001b[38;5;124m.4f\u001b[39m\u001b[38;5;132;01m}\u001b[39;00m\u001b[38;5;124m\"\u001b[39m)\n",
+      "\u001b[1;31mNameError\u001b[0m: name 'metrics' is not defined"
+     ]
+    }
+   ],
+   "source": [
+    "print(\"--- Evaluation Metrics ---\")\n",
+    "for metric, value in metrics.items():\n",
+    "    print(f\"{metric}: {value:.4f}\")"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "f53cbb39-9967-4e4e-8e1d-588bf2aee390",
+   "metadata": {},
+   "outputs": [],
+   "source": []
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 3 (ipykernel)",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.10.11"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 5
+}

From 327a9e7d381442db7d22d26419964f0875954f0c Mon Sep 17 00:00:00 2001
From: pkhara31 <112378664+pkhara31@users.noreply.github.com>
Date: Mon, 8 Dec 2025 14:40:49 +0530
Subject: [PATCH 4/5] Delete
 notebooks/llm-rag-ov-langchain/ov_rag_evaluator.ipynb

---
 .../ov_rag_evaluator.ipynb                    | 1095 -----------------
 1 file changed, 1095 deletions(-)
 delete mode 100644 notebooks/llm-rag-ov-langchain/ov_rag_evaluator.ipynb

diff --git a/notebooks/llm-rag-ov-langchain/ov_rag_evaluator.ipynb b/notebooks/llm-rag-ov-langchain/ov_rag_evaluator.ipynb
deleted file mode 100644
index 0be39a5013f..00000000000
--- a/notebooks/llm-rag-ov-langchain/ov_rag_evaluator.ipynb
+++ /dev/null
@@ -1,1095 +0,0 @@
-{
- "cells": [
-  {
-   "cell_type": "markdown",
-   "id": "7722a495",
-   "metadata": {},
-   "source": [
-    "# RAG Performance & Fairness Evaluation Toolkit (OpenVINO + LangChain)\n",
-    "\n",
-    "This notebook demonstrates how to build and evaluate a Retrieval-Augmented Generation (RAG) pipeline using OpenVINO™ for accelerated performance on Intel hardware. We will use Hugging Face and LangChain libraries to construct the pipeline.\n",
-    "\n",
-    "The process involves:\n",
-    "1.  **Environment Setup**: Installing necessary libraries.\n",
-    "2.  **LLM and Tokenizer Setup**: Loading a language model (Microsoft's Phi-3-mini) and its tokenizer, optimized with OpenVINO.\n",
-    "3.  **Embedding Model Setup**: Preparing an embedding model to convert text into vector representations.\n",
-    "4.  **Data Loading and Processing**: Fetching documents from a web source, splitting them into manageable chunks, and creating vector embeddings.\n",
-    "5.  **Vector Store and Retriever Setup**: Storing the embeddings in a ChromaDB vector store and setting up a retriever with reranking for improved accuracy.\n",
-    "6.  **Building the RAG Chain**: Creating a `RetrievalQA` chain that combines the retriever and the LLM.\n",
-    "7.  **Running the RAG Pipeline**: Asking a question to get a response from the RAG system.\n",
-    "8.  **Evaluation**: Using a comprehensive `OpenVINORAGEvaluator` to assess the quality of the generated response based on various metrics like BLEU, ROUGE, BERTScore, perplexity, and bias."
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "81a21a14",
-   "metadata": {},
-   "source": [
-    "## 1. Environment Setup\n",
-    "\n",
-    "First, let's ensure all the required Python packages are installed. The following commands handle the installation of essential libraries. These are typically only needed if you encounter version conflicts or issues with existing installations."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 1,
-   "id": "c4a2dc6a-3d3e-4da2-902f-30f3cbd24b39",
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "import os\n",
-    "import requests\n",
-    "from pathlib import Path\n",
-    "\n",
-    "if not Path(\"notebook_utils.py\").exists():\n",
-    "    r = requests.get(\n",
-    "        url=\"https://raw.githubusercontent.com/openvinotoolkit/openvino_notebooks/latest/utils/notebook_utils.py\",\n",
-    "    )\n",
-    "    with open(\"notebook_utils.py\", \"w\") as f:\n",
-    "        f.write(r.text)\n",
-    "\n",
-    "if not Path(\"pip_helper.py\").exists():\n",
-    "    r = requests.get(\n",
-    "        url=\"https://raw.githubusercontent.com/openvinotoolkit/openvino_notebooks/latest/utils/pip_helper.py\",\n",
-    "    )\n",
-    "    open(\"pip_helper.py\", \"w\").write(r.text)\n",
-    "\n",
-    "from pip_helper import pip_install\n",
-    "\n",
-    "os.environ[\"GIT_CLONE_PROTECTION_ACTIVE\"] = \"false\"\n",
-    "\n",
-    "pip_install(\"--pre\", \"-U\", \"openvino>=2025.3.0\", \"--extra-index-url\", \"https://storage.openvinotoolkit.org/simple/wheels/nightly\")\n",
-    "pip_install(\"--pre\", \"-U\", \"openvino-tokenizers\", \"--extra-index-url\", \"https://storage.openvinotoolkit.org/simple/wheels/nightly\")\n",
-    "pip_install(\n",
-    "    \"--extra-index-url\",\n",
-    "    \"https://download.pytorch.org/whl/cpu\",\n",
-    "    \"--upgrade-strategy\",\n",
-    "    \"eager\",\n",
-    "    \"optimum[openvino,nncf,onnxruntime]\",\n",
-    "    \"sacrebleu\",\n",
-    "    \"rouge-score\",\n",
-    "    \"nncf>=2.18.0\",\n",
-    "    \"bert-score\",\n",
-    "    \"transformers\",\n",
-    "    \"onnx\",\n",
-    "    \"nltk\",\n",
-    "    \"numpy\",\n",
-    "    \"textblob\",\n",
-    "    \"dataset\",\n",
-    "    \"langchain\",\n",
-    "    \"langchain_community\",\n",
-    "    \"chromadb\",\n",
-    "    \"langchain-chroma\",\n",
-    "    \"langchain-huggingface\",\n",
-    "    \"sentence-transformers\",\n",
-    "    \"Flashrank\",\n",
-    "    \"msoffcrypto-tool\",\n",
-    "    \"docx2txt\",\n",
-    "    \"bs4\",\n",
-    "    \"python-docx\",\n",
-    "    \"huggingface-hub>=0.26.5\",\n",
-    ")"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "8a005fb2",
-   "metadata": {},
-   "source": [
-    "## 2. LLM and Tokenizer Setup\n",
-    "\n",
-    "Next, we load the Large Language Model (LLM) and its corresponding tokenizer. We use `optimum-intel` to convert and accelerate the model with OpenVINO. In this example, we use `microsoft/Phi-3-mini-4k-instruct`, but you can replace it with another compatible model.\n",
-    "\n",
-    "- **`OVModelForCausalLM`**: Loads a causal language model and automatically converts it to the OpenVINO format (`export=True`).\n",
-    "- **`device=\"GPU\"`**: Specifies that the model should run on the integrated GPU for acceleration. You can change this to `\"CPU\"`."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 1,
-   "id": "90e68d95-9a4e-4ba5-9040-4422c1333444",
-   "metadata": {},
-   "outputs": [
-    {
-     "data": {
-      "application/vnd.jupyter.widget-view+json": {
-       "model_id": "5c056e6b6ce94a01b542d9035c2d9523",
-       "version_major": 2,
-       "version_minor": 0
-      },
-      "text/plain": [
-       "Loading checkpoint shards:   0%|          | 0/2 [00:00 0:\n",
-      "C:\\Users\\Local_Admin\\ovraglangchain\\lib\\site-packages\\optimum\\exporters\\openvino\\model_patcher.py:203: TracerWarning: torch.tensor results are registered as constants in the trace. You can safely ignore this warning if you use this function to create tensors out of constant variables that would be the same every time you call this function. In any other case, this might cause the trace to be incorrect.\n",
-      "  torch.tensor(0.0, device=mask.device, dtype=dtype),\n",
-      "C:\\Users\\Local_Admin\\ovraglangchain\\lib\\site-packages\\optimum\\exporters\\openvino\\model_patcher.py:204: TracerWarning: torch.tensor results are registered as constants in the trace. You can safely ignore this warning if you use this function to create tensors out of constant variables that would be the same every time you call this function. In any other case, this might cause the trace to be incorrect.\n",
-      "  torch.tensor(torch.finfo(torch.float16).min, device=mask.device, dtype=dtype),\n",
-      "C:\\Users\\Local_Admin\\ovraglangchain\\lib\\site-packages\\transformers\\cache_utils.py:551: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!\n",
-      "  elif (\n",
-      "C:\\Users\\Local_Admin\\ovraglangchain\\lib\\site-packages\\transformers\\integrations\\sdpa_attention.py:59: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!\n",
-      "  is_causal = query.shape[2] > 1 and attention_mask is None and getattr(module, \"is_causal\", True)\n"
-     ]
-    },
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "INFO:nncf:Statistics of the bitwidth distribution:\n",
-      "+---------------------------+-----------------------------+----------------------------------------+\n",
-      "| Weight compression mode   | % all parameters (layers)   | % ratio-defining parameters (layers)   |\n",
-      "+===========================+=============================+========================================+\n",
-      "| int8_asym, per-channel    | 100% (130 / 130)            | 100% (130 / 130)                       |\n",
-      "+---------------------------+-----------------------------+----------------------------------------+\n"
-     ]
-    },
-    {
-     "data": {
-      "application/vnd.jupyter.widget-view+json": {
-       "model_id": "198db25aa4394b4f9bb74f3eacf2ca7d",
-       "version_major": 2,
-       "version_minor": 0
-      },
-      "text/plain": [
-       "Output()"
-      ]
-     },
-     "metadata": {},
-     "output_type": "display_data"
-    },
-    {
-     "data": {
-      "text/html": [
-       "
\n"
-      ],
-      "text/plain": []
-     },
-     "metadata": {},
-     "output_type": "display_data"
-    }
-   ],
-   "source": [
-    "from optimum.intel import OVModelForCausalLM\n",
-    "from transformers import AutoTokenizer, pipeline\n",
-    "from langchain_huggingface import HuggingFacePipeline\n",
-    "\n",
-    "# Load model with OpenVINO backend\n",
-    "model = OVModelForCausalLM.from_pretrained(\n",
-    "    \"microsoft/Phi-3-mini-4k-instruct\", # You can plug in any other supported model\n",
-    "    export=True,  # Convert to OpenVINO format on the fly\n",
-    "    device=\"GPU\"  # Specify GPU for inference, can also be \"CPU\"\n",
-    ")\n",
-    "\n",
-    "tokenizer = AutoTokenizer.from_pretrained(\"microsoft/Phi-3-mini-4k-instruct\")\n",
-    "model.save_pretrained(\"ov_model\")"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "27419145",
-   "metadata": {},
-   "source": [
-    "### Create a LangChain-compatible LLM Pipeline\n",
-    "\n",
-    "We now create a `text-generation` pipeline using the OpenVINO-optimized model and tokenizer. This pipeline is then wrapped in `HuggingFacePipeline` to make it compatible with the LangChain ecosystem. A quick test is run to confirm the pipeline is working correctly."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 2,
-   "id": "f985ca28-9e3d-490d-954c-71b24fc47eda",
-   "metadata": {},
-   "outputs": [
-    {
-     "name": "stderr",
-     "output_type": "stream",
-     "text": [
-      "Device set to use cpu\n"
-     ]
-    },
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "What is an ocean? An ocean is a vast body of saltwater that covers approximately 71% of the Earth's surface. It is the largest component of the hydrosphere and plays a crucial role in the global climate system. Oceans are divided into five major basins: the Pacific, Atlantic, Indian, Southern (Antarctic), and Arctic Oceans. These bodies of water are interconnected and contain a diverse range of marine life, ecosystems, and geological\n"
-     ]
-    }
-   ],
-   "source": [
-    "# Create a text-generation pipeline with the OpenVINO model\n",
-    "llm_pipeline = pipeline(\n",
-    "    \"text-generation\",\n",
-    "    model=model,\n",
-    "    tokenizer=tokenizer,\n",
-    "    device=model.device,\n",
-    "    max_new_tokens=100,\n",
-    "    top_k=50,\n",
-    "    temperature=0.1,\n",
-    "    do_sample=True\n",
-    ")\n",
-    "\n",
-    "# Create a LangChain instance from the Hugging Face pipeline\n",
-    "llm = HuggingFacePipeline(pipeline=llm_pipeline)\n",
-    "\n",
-    "# Test the pipeline with a sample query\n",
-    "response = llm.invoke(\"What is an ocean?\")\n",
-    "print(response)"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "05fee9cc",
-   "metadata": {},
-   "source": [
-    "## 3. Embedding Model Setup\n",
-    "\n",
-    "For the retrieval part of our RAG pipeline, we need an embedding model to convert text documents into numerical vectors. We use `OpenVINOBgeEmbeddings` from `langchain_community`, which provides OpenVINO-optimized embeddings for efficient performance. Here, we use the `bge-small-en-v1.5` model."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 7,
-   "id": "606ff70a-f797-42a5-a697-8cb5c13c0dae",
-   "metadata": {},
-   "outputs": [
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "Model saved to ./saved_bge_model\n"
-     ]
-    },
-    {
-     "name": "stderr",
-     "output_type": "stream",
-     "text": [
-      "C:\\Users\\Local_Admin\\ovraglangchain\\lib\\site-packages\\transformers\\modeling_attn_mask_utils.py:196: TracerWarning: torch.tensor results are registered as constants in the trace. You can safely ignore this warning if you use this function to create tensors out of constant variables that would be the same every time you call this function. In any other case, this might cause the trace to be incorrect.\n",
-      "  inverted_mask = torch.tensor(1.0, dtype=dtype) - expanded_mask\n"
-     ]
-    },
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "Sample embedding (first 3 dimensions): [-0.042086612433195114, 0.06681863963603973, 0.007916754111647606]\n"
-     ]
-    }
-   ],
-   "source": [
-    "from langchain_community.embeddings import OpenVINOBgeEmbeddings\n",
-    "from sentence_transformers import SentenceTransformer\n",
-    "import os\n",
-    "\n",
-    "# First time: Download and save the model\n",
-    "embedding_model_name = \"BAAI/bge-small-en-v1.5\"  # Full HF repo path\n",
-    "save_directory = \"./saved_bge_model\"\n",
-    "\n",
-    "# Download the model using SentenceTransformer directly\n",
-    "st_model = SentenceTransformer(embedding_model_name)\n",
-    "st_model.save(save_directory)\n",
-    "print(f\"Model saved to {save_directory}\")\n",
-    "\n",
-    "# Now create the OpenVINO embedding with the saved model\n",
-    "embedding = OpenVINOBgeEmbeddings(\n",
-    "    model_name_or_path=save_directory,  # Use saved path\n",
-    "    model_kwargs={\"device\": \"CPU\"},\n",
-    "    encode_kwargs={\"normalize_embeddings\": True},\n",
-    ")\n",
-    "\n",
-    "# Load the saved model from local directory\n",
-    "local_model_path = \"./saved_bge_model\"\n",
-    "\n",
-    "embedding = OpenVINOBgeEmbeddings(\n",
-    "    model_name_or_path=local_model_path,\n",
-    "    model_kwargs={\"device\": \"CPU\"},\n",
-    "    encode_kwargs={\"normalize_embeddings\": True},\n",
-    ")\n",
-    "\n",
-    "# Test the loaded model\n",
-    "text = \"This is a test document.\"\n",
-    "embedding_result = embedding.embed_query(text)\n",
-    "print(\"Sample embedding (first 3 dimensions):\", embedding_result[:3])"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "f8defdcb",
-   "metadata": {},
-   "source": [
-    "## 4. Data Loading and Processing\n",
-    "\n",
-    "Now we'll load the documents that will form the knowledge base for our RAG pipeline. This notebook includes two methods for loading documents:\n",
-    "\n",
-    "1.  **Web Crawling (Enabled by default)**: Fetches content from a website's sitemap. We use `WebBaseLoader` to load content from URLs found in the sitemap of Zerodha Varsity.\n",
-    "2.  **Local File Loading (Commented out)**: A robust `LangChainDocumentLoader` class is provided to load various file types (`.txt`, `.pdf`, `.docx`, etc.) from a local directory. You can uncomment and adapt this section if you want to use your own local files."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 4,
-   "id": "808e9c2d-ab4a-4bc6-bb45-f3b2d4be3156",
-   "metadata": {},
-   "outputs": [
-    {
-     "name": "stderr",
-     "output_type": "stream",
-     "text": [
-      "0it [00:00, ?it/s]\n",
-      "100%|████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:01<00:00,  1.18s/it]"
-     ]
-    },
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "Loaded 8 local documents.\n"
-     ]
-    },
-    {
-     "name": "stderr",
-     "output_type": "stream",
-     "text": [
-      "\n"
-     ]
-    }
-   ],
-   "source": [
-    "import bs4\n",
-    "from urllib.request import Request, urlopen\n",
-    "from bs4 import BeautifulSoup\n",
-    "import ssl\n",
-    "from langchain_community.document_loaders import WebBaseLoader\n",
-    "'''\n",
-    "# --- Method 1: Load documents by crawling a web page (default) ---\n",
-    "def get_sitemap(url):\n",
-    "    \"\"\"Fetches and parses an XML sitemap from a URL.\"\"\"\n",
-    "    req = Request(url, headers={\"User-Agent\": \"Mozilla/5.0\"})\n",
-    "    response = urlopen(req)\n",
-    "    xml = BeautifulSoup(response, \"lxml-xml\", from_encoding=response.info().get_param(\"charset\"))\n",
-    "    return xml\n",
-    "\n",
-    "def get_urls_from_sitemap(xml):\n",
-    "    \"\"\"Extracts all URLs from a parsed sitemap XML.\"\"\"\n",
-    "    urls = [loc.text for loc in xml.find_all(\"loc\")]\n",
-    "    return urls\n",
-    "\n",
-    "# Bypass SSL verification issues if they arise\n",
-    "ssl._create_default_https_context = ssl._create_stdlib_context\n",
-    "\n",
-    "sitemap_url = \"https://zerodha.com/varsity/chapter-sitemap2.xml\"\n",
-    "sitemap_xml = get_sitemap(sitemap_url)\n",
-    "urls = get_urls_from_sitemap(sitemap_xml)\n",
-    "\n",
-    "# Load documents from the collected URLs\n",
-    "docs = []\n",
-    "for i, url in enumerate(urls):\n",
-    "    try:\n",
-    "        loader = WebBaseLoader(url)\n",
-    "        docs.extend(loader.load())\n",
-    "        if (i + 1) % 10 == 0:\n",
-    "            print(f\"Loaded {i + 1}/{len(urls)} URLs\")\n",
-    "    except Exception as e:\n",
-    "        print(f\"Failed to load {url}: {e}\")\n",
-    "\n",
-    "print(f\"\\nTotal documents loaded: {len(docs)}\")\n",
-    "'''\n",
-    "# --- Method 2: Load documents locally from the system (commented out) ---\n",
-    "\n",
-    "import os\n",
-    "from langchain.document_loaders import (\n",
-    "    TextLoader,\n",
-    "    PyPDFLoader,\n",
-    "    DirectoryLoader,\n",
-    ")\n",
-    "from langchain.schema import Document as LCDocument\n",
-    "from typing import List\n",
-    "\n",
-    "class LocalDocumentLoader:\n",
-    "    \"\"\"Load documents from a local directory using LangChain loaders.\"\"\"\n",
-    "    def __init__(self, directory_path: str):\n",
-    "        self.directory_path = directory_path\n",
-    "\n",
-    "    def load(self) -> List[LCDocument]:\n",
-    "        \"\"\"Loads all supported documents from the directory.\"\"\"\n",
-    "        if not self.directory_path:\n",
-    "            raise ValueError(\"Directory path not set.\")\n",
-    "\n",
-    "        # Define loaders for different file types\n",
-    "        txt_loader = DirectoryLoader(\n",
-    "            self.directory_path, glob=\"**/*.txt\", loader_cls=TextLoader,\n",
-    "            loader_kwargs={\"encoding\": \"utf-8\"}, show_progress=True\n",
-    "        )\n",
-    "        pdf_loader = DirectoryLoader(\n",
-    "            self.directory_path, glob=\"**/*.pdf\", loader_cls=PyPDFLoader, show_progress=True\n",
-    "        )\n",
-    "\n",
-    "        documents = []\n",
-    "        documents.extend(txt_loader.load())\n",
-    "        documents.extend(pdf_loader.load())\n",
-    "        \n",
-    "        return documents\n",
-    "\n",
-    "#Usage Example:\n",
-    "loader = LocalDocumentLoader(directory_path=\"content\")\n",
-    "docs = loader.load()\n",
-    "print(f\"Loaded {len(docs)} local documents.\")"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "a6b107e7",
-   "metadata": {},
-   "source": [
-    "### Split Documents into Chunks\n",
-    "\n",
-    "LLMs have a limited context window, so we need to split large documents into smaller chunks. This ensures that the model can process the retrieved information effectively. We use `RecursiveCharacterTextSplitter` which is a smart way to split text while trying to keep related content together."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 5,
-   "id": "51d07ec4-b929-4893-baff-af68a4fbf3aa",
-   "metadata": {},
-   "outputs": [
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "Documents split into 28 chunks.\n"
-     ]
-    }
-   ],
-   "source": [
-    "from langchain.text_splitter import RecursiveCharacterTextSplitter\n",
-    "\n",
-    "# Split the documents into smaller chunks with a specified size and overlap\n",
-    "text_splitter = RecursiveCharacterTextSplitter(\n",
-    "    chunk_size=1250,\n",
-    "    chunk_overlap=100,\n",
-    "    length_function=len,\n",
-    "    is_separator_regex=False\n",
-    ")\n",
-    "\n",
-    "split_docs = text_splitter.split_documents(docs)\n",
-    "print(f\"Documents split into {len(split_docs)} chunks.\")"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "1a734d8c",
-   "metadata": {},
-   "source": [
-    "## 5. Vector Store and Retriever Setup\n",
-    "\n",
-    "Now we'll create a vector store to house the document embeddings and enable efficient similarity searches.\n",
-    "\n",
-    "- **`Chroma`**: We use ChromaDB as our vector store. It's a lightweight and easy-to-use vector database.\n",
-    "- **`persist_directory`**: This saves the created database to disk, allowing us to reuse it later without re-processing the documents."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 8,
-   "id": "194acf26-f710-483d-97d6-57bfff7cfa65",
-   "metadata": {},
-   "outputs": [
-    {
-     "name": "stderr",
-     "output_type": "stream",
-     "text": [
-      "C:\\Users\\Local_Admin\\AppData\\Local\\Temp\\ipykernel_4188\\156854944.py:4: LangChainDeprecationWarning: The class `Chroma` was deprecated in LangChain 0.2.9 and will be removed in 1.0. An updated version of the class exists in the :class:`~langchain-chroma package and should be used instead. To use it run `pip install -U :class:`~langchain-chroma` and import as `from :class:`~langchain_chroma import Chroma``.\n",
-      "  vectorstore = Chroma(\n"
-     ]
-    }
-   ],
-   "source": [
-    "# Create a ChromaDB instance to store the document embeddings\n",
-    "from langchain_community.vectorstores import Chroma\n",
-    "\n",
-    "vectorstore = Chroma(\n",
-    "    embedding_function=embedding,\n",
-    "    persist_directory=\"./chromadb_varsity\",\n",
-    "    collection_name=\"zerodha_varsity_docs\"\n",
-    ")"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "0690e134",
-   "metadata": {},
-   "source": [
-    "### Add Documents to the Vector Store\n",
-    "\n",
-    "We add the processed document chunks to the vector store. To handle a large number of documents efficiently, we add them in batches. The metadata is also filtered to ensure compatibility with the vector store."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 9,
-   "id": "08f6ce04-9702-4372-be96-6fc34431fc21",
-   "metadata": {},
-   "outputs": [
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "Added batch 1/1\n"
-     ]
-    },
-    {
-     "name": "stderr",
-     "output_type": "stream",
-     "text": [
-      "C:\\Users\\Local_Admin\\AppData\\Local\\Temp\\ipykernel_4188\\1265604352.py:12: LangChainDeprecationWarning: Since Chroma 0.4.x the manual persistence method is no longer supported as docs are automatically persisted.\n",
-      "  vectorstore.persist()\n"
-     ]
-    }
-   ],
-   "source": [
-    "from langchain_community.vectorstores.utils import filter_complex_metadata\n",
-    "\n",
-    "# Function to insert embeddings in batches for a lengthy document set\n",
-    "def add_documents_in_batches(vectorstore, docs, batch_size=100):\n",
-    "    \"\"\"Adds documents to the vectorstore in batches.\"\"\"\n",
-    "    for i in range(0, len(docs), batch_size):\n",
-    "        chunk = docs[i : i + batch_size]\n",
-    "        vectorstore.add_documents(chunk)\n",
-    "        print(f\"Added batch {i//batch_size + 1}/{(len(docs)-1)//batch_size + 1}\")\n",
-    "    # Persist the database to disk if the method is available\n",
-    "    if hasattr(vectorstore, \"persist\"):\n",
-    "        vectorstore.persist()\n",
-    "\n",
-    "# Filter out complex metadata that might cause issues\n",
-    "filtered_docs = filter_complex_metadata(split_docs)\n",
-    "\n",
-    "# Add the documents to the vector store in batches\n",
-    "add_documents_in_batches(vectorstore, filtered_docs)"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "4af0346d",
-   "metadata": {},
-   "source": [
-    "### Set up a Reranking Retriever\n",
-    "\n",
-    "To improve the quality of retrieved documents, we use a reranker. The initial retriever fetches a set of documents (e.g., k=5), and the reranker (`FlashrankRerank`) re-orders them based on their relevance to the query. This ensures that the most relevant context is passed to the LLM.\n",
-    "\n",
-    "- **`ContextualCompressionRetriever`**: Wraps a base retriever and a document compressor (the reranker) to create this two-stage retrieval process."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 10,
-   "id": "01bc1634-dcea-431c-b447-af5b7d38aaeb",
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "from langchain.retrievers import ContextualCompressionRetriever\n",
-    "from langchain.retrievers.document_compressors import FlashrankRerank\n",
-    "\n",
-    "# Set up the base retriever to fetch the top 5 documents\n",
-    "retriever = vectorstore.as_retriever(search_kwargs={\"k\": 5})\n",
-    "\n",
-    "# Initialize the reranker\n",
-    "compressor = FlashrankRerank()\n",
-    "\n",
-    "# Create the compression retriever, which combines retrieval and reranking\n",
-    "compression_retriever = ContextualCompressionRetriever(\n",
-    "    base_compressor=compressor,\n",
-    "    base_retriever=retriever\n",
-    ")"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "2c9bd8c5",
-   "metadata": {},
-   "source": [
-    "## 6. Building the RAG Chain\n",
-    "\n",
-    "With all the components ready, we now assemble the final RAG pipeline using LangChain's `RetrievalQA` chain. This chain connects the LLM with the retriever.\n",
-    "\n",
-    "- **`chain_type=\"stuff\"`**: This means all retrieved documents will be \"stuffed\" into the prompt sent to the LLM.\n",
-    "- **`return_source_documents=True`**: This is important for evaluation, as it allows us to see which documents were used to generate the answer."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 11,
-   "id": "e18d5187-a6c6-406f-b5b2-f9982d97d3a2",
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "from langchain.chains import RetrievalQA\n",
-    "\n",
-    "qa_chain = RetrievalQA.from_chain_type(\n",
-    "    llm=llm,\n",
-    "    chain_type=\"stuff\",\n",
-    "    retriever=compression_retriever,\n",
-    "    return_source_documents=True\n",
-    ")"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "db9453a2",
-   "metadata": {},
-   "source": [
-    "## 7. Running the RAG Pipeline\n",
-    "\n",
-    "It's time to ask a question! The `qa_chain.invoke` method will execute the full RAG process: retrieve relevant documents, pass them to the LLM along with the question, and return the final answer."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 12,
-   "id": "fc8def11-554b-4bd1-ab37-9824f003966e",
-   "metadata": {},
-   "outputs": [
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "--- Question ---\n",
-      "What is deep link?\n",
-      "\n",
-      "--- Answer ---\n",
-      "Use the following pieces of context to answer the question at the end. If you don't know the answer, just say that you don't know, don't try to make up an answer.\n",
-      "\n",
-      "Introduction\n",
-      "With the release of the Intel® 11th Generation mobile processor and the \n",
-      "Intel® Iris® Xe and Intel® Iris® Xe MAX graphics architecture, Deep Link was \n",
-      "introduced to the world and a new era of innovation was born.\n",
-      "Developers now have the ability to strategically apply computing power \n",
-      "that was previously unavailable, and to assign tasks to parts of the machine \n",
-      "which would otherwise just lie dormant. Imagine having the ability to \n",
-      "significantly boost the performance of your application using not much \n",
-      "more than a strategic approach and some lines of code.  \n",
-      "That is the power of Deep Link.\n",
-      "Table of Contents\n",
-      "Introduction  .................... 1\n",
-      "Deep Link Technology  ........... 2\n",
-      "OpenVINO  ...................... 2\n",
-      "Use Case: Topaz Gigapixel AI  ..... 3\n",
-      "Use Case: AI-powered Video  ..... 4\n",
-      "Getting Started  .................. 5\n",
-      "Platform Considerations  ......... 7\n",
-      "Online Resources  ................ 8\n",
-      "Conclusion ...................... 8\n",
-      "This paper is part one in a series of white papers designed to provide details \n",
-      "regarding openly available development tools that can be used to take full \n",
-      "advantage of Intel® Deep Link Technology.\n",
-      "Authors\n",
-      "Roman Borisov \n",
-      "Senior Software Application Engineer \n",
-      "Max Domeika\n",
-      "Principal Engineer\n",
-      "\n",
-      "developers can take full advantage of Deep Link Technology and build a solution that efficiently utilizes all of the computing \n",
-      "power available to the system without favoring or ignoring any components - whether integrated or discrete.\n",
-      "One of those tools is the OpenVINO™ Toolkit.\n",
-      "The OpenVINO™ Toolkit was developed by Intel® and released to the open source community in May of 2018. Two versions of \n",
-      "the toolkit are available today; one which is fully open source and a second which is distributed and supported by Intel®.\n",
-      "The toolkit was designed to help developers save time, energy and resources by providing the tools to build deep learning and \n",
-      "Artificial Intelligence (AI)-driven applications that not only perform much better, but are also easier to create. It does this in a \n",
-      "number of different ways, including:\n",
-      "Heterogeneous Execution (write once - deploy anywhere) \n",
-      "OpenVINO™ allows developers to write a segment of code one time, and then deploy a device-specific iteration of the same \n",
-      "code across multiple computational components (CPU, iGPU, dGPU, Vision Processing Unit, FPGA, etc.).\n",
-      "Intuitive Workflow\n",
-      "The workflow that it utilizes was built from the ground up to allow OpenVINO™ to be an efficient and comprehensive\n",
-      "\n",
-      "process, along with pre- and post-processing and CODEC stages. This pipeline shows a common process involving style \n",
-      "transfer of the original video clip, followed by an upscaling operation used to improve the image quality of the newly stylized \n",
-      "clip. This process is typical of one which might be found in a video editing application.\n",
-      "White Paper | Unlocking the Power of Intel ® Deep Link   Part One: Client Artificial Intelligence Using Intel ® GPUs\n",
-      "Table 1. Device utilization statistics for multi-device test implementation.\n",
-      "* - The theoretical maximum Frames-per-Second (FPS) performance for Deep Link in this configuration would be the sum of the \n",
-      "performance of each individual GPU (in this case 8.7 + 6.9 = 15.6 FPS), but it is clear from the table that when two GPUs are engaged \n",
-      "simultaneously both GPUs are under-utilized.\n",
-      "Changing the workload partitioning and using the GPU(s) to perform all of the pre- and post-processing (using CV::UMat abstraction) would \n",
-      "increase efficiency and increase the FPS rate substantially. In addition, decoding directly to video memory and encoding directly from video \n",
-      "memory to avoid unnecessary data transfer would increase efficiency, but this is not currently supported by OpenCV. Intel™ Media SDK\n",
-      "\n",
-      "Question: What is deep link?\n",
-      "Helpful Answer:\n",
-      "\n",
-      "Deep Link is a technology introduced by Intel that allows developers to strategically apply computing power to parts of a machine that would otherwise be underutilized. It enables the boosting of application performance through a strategic approach and some lines of code. Deep Link technology, along with the OpenVINO Toolkit, helps developers build AI-driven applications that perform better and are easier to create. It achieves this by allowing for heterogeneous execution (write once, deploy anywhere\n"
-     ]
-    }
-   ],
-   "source": [
-    "question = \"What is deep link?\"\n",
-    "result = qa_chain.invoke({\"query\": question})\n",
-    "print(\"--- Question ---\")\n",
-    "print(question)\n",
-    "print(\"\\n--- Answer ---\")\n",
-    "print(result[\"result\"])"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "b30815a2",
-   "metadata": {},
-   "source": [
-    "### Extract Answer and Context for Evaluation\n",
-    "\n",
-    "For the evaluation step, we need to isolate the generated answer and the source documents (the context or \"reference\")."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 13,
-   "id": "991a94fc-b7b3-4709-896b-c613e1b857b8",
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "answer = result['result']\n",
-    "context = \" \".join([d.page_content for d in result['source_documents']])"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "bf65db80",
-   "metadata": {},
-   "source": [
-    "## 8. Evaluation\n",
-    "\n",
-    "To assess the quality of our RAG pipeline, we use a custom `OpenVINORAGEvaluator` class. This class uses OpenVINO-optimized models to calculate several key metrics:\n",
-    "\n",
-    "- **BLEU & ROUGE**: Measure the overlap between the generated answer and the reference context.\n",
-    "- **BERTScore**: Computes semantic similarity, which is more advanced than simple overlap.\n",
-    "- **Perplexity**: Measures how well a language model (here, Llama-2-7B) predicts the generated text. Lower is better.\n",
-    "- **Diversity**: Calculates the variety of tokens in the response.\n",
-    "- **Racial Bias**: Uses a hate speech detection model to check for biased content.\n",
-    "\n",
-    "**Note**: The first time you run this, it will download and convert the necessary evaluation models (Llama-2-7B and a hate speech model) to the OpenVINO format. This is a one-time setup."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 14,
-   "id": "1a8122e4-6602-4750-ad6e-c5cc599e0b0a",
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "import openvino as ov\n",
-    "import numpy as np\n",
-    "import torch\n",
-    "from transformers import AutoTokenizer, AutoModelForCausalLM, AutoModelForSequenceClassification\n",
-    "from optimum.intel import OVModelForCausalLM, OVModelForSequenceClassification\n",
-    "from nltk.translate.bleu_score import corpus_bleu, SmoothingFunction\n",
-    "from rouge_score import rouge_scorer\n",
-    "from bert_score import score\n",
-    "from nltk.util import ngrams\n",
-    "from typing import List\n",
-    "import os\n",
-    "\n",
-    "class OpenVINORAGEvaluator:\n",
-    "    \"\"\"An evaluator for RAG pipelines using OpenVINO-optimized models.\"\"\"\n",
-    "    \n",
-    "    def __init__(self, device=\"GPU\", models_dir=\"./openvino_models\"):\n",
-    "        self.device = device\n",
-    "        self.models_dir = models_dir\n",
-    "        os.makedirs(self.models_dir, exist_ok=True)\n",
-    "        \n",
-    "        # Initialize models and tokenizers for evaluation\n",
-    "        self.llama2_model, self.llama2_tokenizer = self._load_model(\n",
-    "            model_id=\"meta-llama/Llama-2-7b-hf\",\n",
-    "            ov_model_class=OVModelForCausalLM,\n",
-    "            subfolder=\"llama2-7b-openvino\"\n",
-    "        )\n",
-    "        self.bias_model, self.bias_tokenizer = self._load_model(\n",
-    "            model_id=\"Hate-speech-CNERG/dehatebert-mono-english\",\n",
-    "            ov_model_class=OVModelForSequenceClassification,\n",
-    "            subfolder=\"hate-speech-openvino\"\n",
-    "        )\n",
-    "        print(f\"OpenVINO RAG Evaluator initialized on {device}\")\n",
-    "\n",
-    "    def _load_model(self, model_id, ov_model_class, subfolder):\n",
-    "        \"\"\"Generic function to load or convert a model to OpenVINO format.\"\"\"\n",
-    "        model_path = os.path.join(self.models_dir, subfolder)\n",
-    "        \n",
-    "        if not os.path.exists(os.path.join(model_path, \"openvino_model.xml\")):\n",
-    "            print(f\"Converting {model_id} to OpenVINO format...\")\n",
-    "            ov_model = ov_model_class.from_pretrained(model_id, export=True, compile=False)\n",
-    "            ov_model.save_pretrained(model_path)\n",
-    "            print(f\"Model saved to {model_path}\")\n",
-    "        \n",
-    "        try:\n",
-    "            print(f\"Loading {model_id} from {model_path}...\")\n",
-    "            model = ov_model_class.from_pretrained(model_path, device=self.device)\n",
-    "            tokenizer = AutoTokenizer.from_pretrained(model_id)\n",
-    "            print(f\"{model_id} loaded successfully.\")\n",
-    "            return model, tokenizer\n",
-    "        except Exception as e:\n",
-    "            print(f\"Error loading {model_id}: {e}\")\n",
-    "            return None, None\n",
-    "\n",
-    "    def evaluate_bleu_rouge(self, candidates: List[str], references: List[str]):\n",
-    "        \"\"\"Calculates BLEU and ROUGE scores.\"\"\"\n",
-    "        candidate_tokens = [c.split() for c in candidates]\n",
-    "        reference_tokens = [[r.split()] for r in references]\n",
-    "        \n",
-    "        # BLEU with smoothing\n",
-    "        smoothing = SmoothingFunction().method1\n",
-    "        bleu_score = corpus_bleu(reference_tokens, candidate_tokens, smoothing_function=smoothing)\n",
-    "        \n",
-    "        # ROUGE\n",
-    "        scorer = rouge_scorer.RougeScorer(['rouge1', 'rougeL'], use_stemmer=True)\n",
-    "        rouge1_f1 = sum(scorer.score(ref, cand)['rouge1'].fmeasure for ref, cand in zip(references, candidates)) / len(candidates)\n",
-    "        return bleu_score, rouge1_f1\n",
-    "\n",
-    "    def evaluate_bert_score(self, candidates: List[str], references: List[str]):\n",
-    "        \"\"\"Calculates BERTScore.\"\"\"\n",
-    "        _, _, f1 = score(candidates, references, lang=\"en\", model_type='bert-base-multilingual-cased')\n",
-    "        return f1.mean().item()\n",
-    "\n",
-    "    def evaluate_perplexity(self, text: str):\n",
-    "        \"\"\"Calculates perplexity using the loaded Llama-2 model.\"\"\"\n",
-    "        if not self.llama2_model:\n",
-    "            return float('inf')\n",
-    "        \n",
-    "        try:\n",
-    "            encodings = self.llama2_tokenizer(text, return_tensors='pt', max_length=1024, truncation=True)\n",
-    "            input_ids = encodings.input_ids\n",
-    "            \n",
-    "            with torch.no_grad():\n",
-    "                outputs = self.llama2_model(input_ids)\n",
-    "                logits = outputs.logits\n",
-    "                \n",
-    "                # Manually calculate cross-entropy loss\n",
-    "                # Shift logits and labels for next-token prediction\n",
-    "                shift_logits = logits[..., :-1, :].contiguous()\n",
-    "                shift_labels = input_ids[..., 1:].contiguous()\n",
-    "                \n",
-    "                # Calculate loss\n",
-    "                loss_fct = torch.nn.CrossEntropyLoss()\n",
-    "                loss = loss_fct(shift_logits.view(-1, shift_logits.size(-1)), shift_labels.view(-1))\n",
-    "                perplexity = torch.exp(loss)\n",
-    "            \n",
-    "            return perplexity.item()\n",
-    "        except Exception as e:\n",
-    "            print(f\"Error calculating perplexity: {e}\")\n",
-    "            return float('inf')\n",
-    "\n",
-    "    def evaluate_racial_bias(self, text: str):\n",
-    "        \"\"\"Evaluates racial bias using a hate speech detection model.\"\"\"\n",
-    "        if not self.bias_model:\n",
-    "            return 0.0\n",
-    "\n",
-    "        try:\n",
-    "            inputs = self.bias_tokenizer(text, return_tensors=\"pt\", truncation=True, max_length=512)\n",
-    "            with torch.no_grad():\n",
-    "                logits = self.bias_model(**inputs).logits\n",
-    "                probabilities = torch.nn.functional.softmax(logits, dim=-1)\n",
-    "                # Return the probability of the 'hate speech' class (index 1)\n",
-    "                bias_score = probabilities[0][1].item()\n",
-    "            return bias_score\n",
-    "        except Exception as e:\n",
-    "            print(f\"Error calculating bias: {e}\")\n",
-    "            return 0.0\n",
-    "    \n",
-    "    def evaluate_all(self, response: str, reference: str):\n",
-    "        \"\"\"Runs a comprehensive evaluation and returns all metrics.\"\"\"\n",
-    "        candidates = [response]\n",
-    "        references = [reference]\n",
-    "        \n",
-    "        try:\n",
-    "            bleu, rouge1 = self.evaluate_bleu_rouge(candidates, references)\n",
-    "            bert_f1 = self.evaluate_bert_score(candidates, references)\n",
-    "            perplexity = self.evaluate_perplexity(response)\n",
-    "            racial_bias = self.evaluate_racial_bias(response)\n",
-    "            \n",
-    "            return {\n",
-    "                \"BLEU\": bleu,\n",
-    "                \"ROUGE-1\": rouge1,\n",
-    "                \"BERT F1\": bert_f1,\n",
-    "                \"Perplexity\": perplexity,\n",
-    "                \"Racial Bias\": racial_bias\n",
-    "            }\n",
-    "        except Exception as e:\n",
-    "            print(f\"An error occurred during evaluation: {e}\")\n",
-    "            return {k: 0.0 for k in [\"BLEU\", \"ROUGE-1\", \"BERT F1\", \"Perplexity\", \"Racial Bias\"]}"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "e9c19402",
-   "metadata": {},
-   "source": [
-    "### Run the Evaluation\n",
-    "\n",
-    "Finally, we initialize the `OpenVINORAGEvaluator` and call `evaluate_all` to get a dictionary of scores. This provides a quantitative look at the performance of our RAG pipeline for the given query."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 16,
-   "id": "3dedb93b-20e3-43c3-93a1-38cb3e114019",
-   "metadata": {},
-   "outputs": [
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "Converting meta-llama/Llama-2-7b-hf to OpenVINO format...\n"
-     ]
-    },
-    {
-     "data": {
-      "application/vnd.jupyter.widget-view+json": {
-       "model_id": "363201c665cc483da02707818ef1fcf3",
-       "version_major": 2,
-       "version_minor": 0
-      },
-      "text/plain": [
-       "Loading checkpoint shards:   0%|          | 0/2 [00:00\n"
-      ],
-      "text/plain": []
-     },
-     "metadata": {},
-     "output_type": "display_data"
-    },
-    {
-     "ename": "RuntimeError",
-     "evalue": "Exception from src\\inference\\src\\cpp\\core.cpp:97:\nCheck 'false' failed at src\\frontends\\common\\src\\frontend.cpp:54:\nConverting input model\nstoll argument out of range\n\n",
-     "output_type": "error",
-     "traceback": [
-      "\u001b[1;31m---------------------------------------------------------------------------\u001b[0m",
-      "\u001b[1;31mRuntimeError\u001b[0m                              Traceback (most recent call last)",
-      "Cell \u001b[1;32mIn[16], line 2\u001b[0m\n\u001b[0;32m      1\u001b[0m \u001b[38;5;66;03m# Initialize the evaluator (this might take a moment on the first run)\u001b[39;00m\n\u001b[1;32m----> 2\u001b[0m evaluator \u001b[38;5;241m=\u001b[39m \u001b[43mOpenVINORAGEvaluator\u001b[49m\u001b[43m(\u001b[49m\u001b[43mdevice\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[38;5;124;43m\"\u001b[39;49m\u001b[38;5;124;43mGPU\u001b[39;49m\u001b[38;5;124;43m\"\u001b[39;49m\u001b[43m)\u001b[49m\n\u001b[0;32m      4\u001b[0m \u001b[38;5;66;03m# Prepare the data for evaluation\u001b[39;00m\n\u001b[0;32m      5\u001b[0m response_text \u001b[38;5;241m=\u001b[39m answer\n",
-      "Cell \u001b[1;32mIn[14], line 22\u001b[0m, in \u001b[0;36mOpenVINORAGEvaluator.__init__\u001b[1;34m(self, device, models_dir)\u001b[0m\n\u001b[0;32m     19\u001b[0m os\u001b[38;5;241m.\u001b[39mmakedirs(\u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39mmodels_dir, exist_ok\u001b[38;5;241m=\u001b[39m\u001b[38;5;28;01mTrue\u001b[39;00m)\n\u001b[0;32m     21\u001b[0m \u001b[38;5;66;03m# Initialize models and tokenizers for evaluation\u001b[39;00m\n\u001b[1;32m---> 22\u001b[0m \u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39mllama2_model, \u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39mllama2_tokenizer \u001b[38;5;241m=\u001b[39m \u001b[38;5;28;43mself\u001b[39;49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43m_load_model\u001b[49m\u001b[43m(\u001b[49m\n\u001b[0;32m     23\u001b[0m \u001b[43m    \u001b[49m\u001b[43mmodel_id\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[38;5;124;43m\"\u001b[39;49m\u001b[38;5;124;43mmeta-llama/Llama-2-7b-hf\u001b[39;49m\u001b[38;5;124;43m\"\u001b[39;49m\u001b[43m,\u001b[49m\n\u001b[0;32m     24\u001b[0m \u001b[43m    \u001b[49m\u001b[43mov_model_class\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mOVModelForCausalLM\u001b[49m\u001b[43m,\u001b[49m\n\u001b[0;32m     25\u001b[0m \u001b[43m    \u001b[49m\u001b[43msubfolder\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[38;5;124;43m\"\u001b[39;49m\u001b[38;5;124;43mllama2-7b-openvino\u001b[39;49m\u001b[38;5;124;43m\"\u001b[39;49m\n\u001b[0;32m     26\u001b[0m \u001b[43m\u001b[49m\u001b[43m)\u001b[49m\n\u001b[0;32m     27\u001b[0m \u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39mbias_model, \u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39mbias_tokenizer \u001b[38;5;241m=\u001b[39m \u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39m_load_model(\n\u001b[0;32m     28\u001b[0m     model_id\u001b[38;5;241m=\u001b[39m\u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mHate-speech-CNERG/dehatebert-mono-english\u001b[39m\u001b[38;5;124m\"\u001b[39m,\n\u001b[0;32m     29\u001b[0m     ov_model_class\u001b[38;5;241m=\u001b[39mOVModelForSequenceClassification,\n\u001b[0;32m     30\u001b[0m     subfolder\u001b[38;5;241m=\u001b[39m\u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mhate-speech-openvino\u001b[39m\u001b[38;5;124m\"\u001b[39m\n\u001b[0;32m     31\u001b[0m )\n\u001b[0;32m     32\u001b[0m \u001b[38;5;28mprint\u001b[39m(\u001b[38;5;124mf\u001b[39m\u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mOpenVINO RAG Evaluator initialized on \u001b[39m\u001b[38;5;132;01m{\u001b[39;00mdevice\u001b[38;5;132;01m}\u001b[39;00m\u001b[38;5;124m\"\u001b[39m)\n",
-      "Cell \u001b[1;32mIn[14], line 40\u001b[0m, in \u001b[0;36mOpenVINORAGEvaluator._load_model\u001b[1;34m(self, model_id, ov_model_class, subfolder)\u001b[0m\n\u001b[0;32m     38\u001b[0m \u001b[38;5;28;01mif\u001b[39;00m \u001b[38;5;129;01mnot\u001b[39;00m os\u001b[38;5;241m.\u001b[39mpath\u001b[38;5;241m.\u001b[39mexists(os\u001b[38;5;241m.\u001b[39mpath\u001b[38;5;241m.\u001b[39mjoin(model_path, \u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mopenvino_model.xml\u001b[39m\u001b[38;5;124m\"\u001b[39m)):\n\u001b[0;32m     39\u001b[0m     \u001b[38;5;28mprint\u001b[39m(\u001b[38;5;124mf\u001b[39m\u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mConverting \u001b[39m\u001b[38;5;132;01m{\u001b[39;00mmodel_id\u001b[38;5;132;01m}\u001b[39;00m\u001b[38;5;124m to OpenVINO format...\u001b[39m\u001b[38;5;124m\"\u001b[39m)\n\u001b[1;32m---> 40\u001b[0m     ov_model \u001b[38;5;241m=\u001b[39m \u001b[43mov_model_class\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mfrom_pretrained\u001b[49m\u001b[43m(\u001b[49m\u001b[43mmodel_id\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43mexport\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[38;5;28;43;01mTrue\u001b[39;49;00m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[38;5;28;43mcompile\u001b[39;49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[38;5;28;43;01mFalse\u001b[39;49;00m\u001b[43m)\u001b[49m\n\u001b[0;32m     41\u001b[0m     ov_model\u001b[38;5;241m.\u001b[39msave_pretrained(model_path)\n\u001b[0;32m     42\u001b[0m     \u001b[38;5;28mprint\u001b[39m(\u001b[38;5;124mf\u001b[39m\u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mModel saved to \u001b[39m\u001b[38;5;132;01m{\u001b[39;00mmodel_path\u001b[38;5;132;01m}\u001b[39;00m\u001b[38;5;124m\"\u001b[39m)\n",
-      "File \u001b[1;32m~\\ovraglangchain\\lib\\site-packages\\optimum\\intel\\openvino\\modeling_base.py:583\u001b[0m, in \u001b[0;36mOVBaseModel.from_pretrained\u001b[1;34m(cls, model_id, export, force_download, use_auth_token, token, cache_dir, subfolder, config, local_files_only, trust_remote_code, revision, **kwargs)\u001b[0m\n\u001b[0;32m    578\u001b[0m \u001b[38;5;28;01mexcept\u001b[39;00m \u001b[38;5;167;01mException\u001b[39;00m \u001b[38;5;28;01mas\u001b[39;00m exception:\n\u001b[0;32m    579\u001b[0m     logger\u001b[38;5;241m.\u001b[39mwarning(\n\u001b[0;32m    580\u001b[0m         \u001b[38;5;124mf\u001b[39m\u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mCould not infer whether the model was already converted or not to the OpenVINO IR, keeping `export=\u001b[39m\u001b[38;5;132;01m{\u001b[39;00mexport\u001b[38;5;132;01m}\u001b[39;00m\u001b[38;5;124m`.\u001b[39m\u001b[38;5;130;01m\\n\u001b[39;00m\u001b[38;5;132;01m{\u001b[39;00mexception\u001b[38;5;132;01m}\u001b[39;00m\u001b[38;5;124m\"\u001b[39m\n\u001b[0;32m    581\u001b[0m     )\n\u001b[1;32m--> 583\u001b[0m \u001b[38;5;28;01mreturn\u001b[39;00m \u001b[38;5;28msuper\u001b[39m()\u001b[38;5;241m.\u001b[39mfrom_pretrained(\n\u001b[0;32m    584\u001b[0m     model_id,\n\u001b[0;32m    585\u001b[0m     export\u001b[38;5;241m=\u001b[39m_export,\n\u001b[0;32m    586\u001b[0m     force_download\u001b[38;5;241m=\u001b[39mforce_download,\n\u001b[0;32m    587\u001b[0m     token\u001b[38;5;241m=\u001b[39mtoken,\n\u001b[0;32m    588\u001b[0m     cache_dir\u001b[38;5;241m=\u001b[39mcache_dir,\n\u001b[0;32m    589\u001b[0m     subfolder\u001b[38;5;241m=\u001b[39msubfolder,\n\u001b[0;32m    590\u001b[0m     config\u001b[38;5;241m=\u001b[39mconfig,\n\u001b[0;32m    591\u001b[0m     local_files_only\u001b[38;5;241m=\u001b[39mlocal_files_only,\n\u001b[0;32m    592\u001b[0m     trust_remote_code\u001b[38;5;241m=\u001b[39mtrust_remote_code,\n\u001b[0;32m    593\u001b[0m     revision\u001b[38;5;241m=\u001b[39mrevision,\n\u001b[0;32m    594\u001b[0m     \u001b[38;5;241m*\u001b[39m\u001b[38;5;241m*\u001b[39mkwargs,\n\u001b[0;32m    595\u001b[0m )\n",
-      "File \u001b[1;32m~\\ovraglangchain\\lib\\site-packages\\optimum\\modeling_base.py:407\u001b[0m, in \u001b[0;36mOptimizedModel.from_pretrained\u001b[1;34m(cls, model_id, config, export, subfolder, revision, force_download, local_files_only, trust_remote_code, cache_dir, token, **kwargs)\u001b[0m\n\u001b[0;32m    404\u001b[0m \u001b[38;5;28;01melse\u001b[39;00m:\n\u001b[0;32m    405\u001b[0m     from_pretrained_method \u001b[38;5;241m=\u001b[39m \u001b[38;5;28mcls\u001b[39m\u001b[38;5;241m.\u001b[39m_from_pretrained\n\u001b[1;32m--> 407\u001b[0m \u001b[38;5;28;01mreturn\u001b[39;00m from_pretrained_method(\n\u001b[0;32m    408\u001b[0m     model_id\u001b[38;5;241m=\u001b[39mmodel_id,\n\u001b[0;32m    409\u001b[0m     config\u001b[38;5;241m=\u001b[39mconfig,\n\u001b[0;32m    410\u001b[0m     \u001b[38;5;66;03m# hub options\u001b[39;00m\n\u001b[0;32m    411\u001b[0m     revision\u001b[38;5;241m=\u001b[39mrevision,\n\u001b[0;32m    412\u001b[0m     cache_dir\u001b[38;5;241m=\u001b[39mcache_dir,\n\u001b[0;32m    413\u001b[0m     force_download\u001b[38;5;241m=\u001b[39mforce_download,\n\u001b[0;32m    414\u001b[0m     token\u001b[38;5;241m=\u001b[39mtoken,\n\u001b[0;32m    415\u001b[0m     subfolder\u001b[38;5;241m=\u001b[39msubfolder,\n\u001b[0;32m    416\u001b[0m     local_files_only\u001b[38;5;241m=\u001b[39mlocal_files_only,\n\u001b[0;32m    417\u001b[0m     trust_remote_code\u001b[38;5;241m=\u001b[39mtrust_remote_code,\n\u001b[0;32m    418\u001b[0m     \u001b[38;5;241m*\u001b[39m\u001b[38;5;241m*\u001b[39mkwargs,\n\u001b[0;32m    419\u001b[0m )\n",
-      "File \u001b[1;32m~\\ovraglangchain\\lib\\site-packages\\optimum\\intel\\openvino\\modeling_decoder.py:363\u001b[0m, in \u001b[0;36mOVBaseDecoderModel._export\u001b[1;34m(cls, model_id, config, token, revision, force_download, cache_dir, subfolder, local_files_only, task, use_cache, trust_remote_code, load_in_8bit, quantization_config, **kwargs)\u001b[0m\n\u001b[0;32m    358\u001b[0m \u001b[38;5;28;01mif\u001b[39;00m config\u001b[38;5;241m.\u001b[39mmodel_type \u001b[38;5;241m==\u001b[39m \u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mphi3\u001b[39m\u001b[38;5;124m\"\u001b[39m \u001b[38;5;129;01mand\u001b[39;00m config\u001b[38;5;241m.\u001b[39mmax_position_embeddings \u001b[38;5;241m!=\u001b[39m \u001b[38;5;28mgetattr\u001b[39m(\n\u001b[0;32m    359\u001b[0m     config, \u001b[38;5;124m\"\u001b[39m\u001b[38;5;124moriginal_max_position_embeddings\u001b[39m\u001b[38;5;124m\"\u001b[39m, config\u001b[38;5;241m.\u001b[39mmax_position_embeddings\n\u001b[0;32m    360\u001b[0m ):\n\u001b[0;32m    361\u001b[0m     config\u001b[38;5;241m.\u001b[39mmax_position_embeddings \u001b[38;5;241m=\u001b[39m config\u001b[38;5;241m.\u001b[39moriginal_max_position_embeddings\n\u001b[1;32m--> 363\u001b[0m \u001b[38;5;28;01mreturn\u001b[39;00m \u001b[38;5;28mcls\u001b[39m\u001b[38;5;241m.\u001b[39m_from_pretrained(\n\u001b[0;32m    364\u001b[0m     model_id\u001b[38;5;241m=\u001b[39msave_dir_path,\n\u001b[0;32m    365\u001b[0m     config\u001b[38;5;241m=\u001b[39mconfig,\n\u001b[0;32m    366\u001b[0m     use_cache\u001b[38;5;241m=\u001b[39muse_cache,\n\u001b[0;32m    367\u001b[0m     stateful\u001b[38;5;241m=\u001b[39m\u001b[38;5;28;01mNone\u001b[39;00m,\n\u001b[0;32m    368\u001b[0m     load_in_8bit\u001b[38;5;241m=\u001b[39mload_in_8bit,\n\u001b[0;32m    369\u001b[0m     quantization_config\u001b[38;5;241m=\u001b[39mquantization_config,\n\u001b[0;32m    370\u001b[0m     trust_remote_code\u001b[38;5;241m=\u001b[39mtrust_remote_code,\n\u001b[0;32m    371\u001b[0m     compile_only\u001b[38;5;241m=\u001b[39mcompile_only,\n\u001b[0;32m    372\u001b[0m     \u001b[38;5;241m*\u001b[39m\u001b[38;5;241m*\u001b[39mkwargs,\n\u001b[0;32m    373\u001b[0m )\n",
-      "File \u001b[1;32m~\\ovraglangchain\\lib\\site-packages\\optimum\\intel\\openvino\\modeling_decoder.py:859\u001b[0m, in \u001b[0;36mOVModelForCausalLM._from_pretrained\u001b[1;34m(cls, model_id, config, token, revision, force_download, cache_dir, file_name, subfolder, from_onnx, local_files_only, load_in_8bit, compile_only, quantization_config, trust_remote_code, **kwargs)\u001b[0m\n\u001b[0;32m    847\u001b[0m model_cache_path \u001b[38;5;241m=\u001b[39m \u001b[38;5;28mcls\u001b[39m\u001b[38;5;241m.\u001b[39m_cached_file(\n\u001b[0;32m    848\u001b[0m     model_path\u001b[38;5;241m=\u001b[39mmodel_path,\n\u001b[0;32m    849\u001b[0m     token\u001b[38;5;241m=\u001b[39mtoken,\n\u001b[1;32m   (...)\u001b[0m\n\u001b[0;32m    855\u001b[0m     local_files_only\u001b[38;5;241m=\u001b[39mlocal_files_only,\n\u001b[0;32m    856\u001b[0m )\n\u001b[0;32m    858\u001b[0m \u001b[38;5;28;01mif\u001b[39;00m \u001b[38;5;129;01mnot\u001b[39;00m compile_only:\n\u001b[1;32m--> 859\u001b[0m     model \u001b[38;5;241m=\u001b[39m \u001b[38;5;28;43mcls\u001b[39;49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mload_model\u001b[49m\u001b[43m(\u001b[49m\u001b[43mmodel_cache_path\u001b[49m\u001b[43m)\u001b[49m\n\u001b[0;32m    860\u001b[0m \u001b[38;5;28;01melse\u001b[39;00m:\n\u001b[0;32m    861\u001b[0m     model \u001b[38;5;241m=\u001b[39m \u001b[38;5;28mcls\u001b[39m\u001b[38;5;241m.\u001b[39m_compile_model(\n\u001b[0;32m    862\u001b[0m         model_cache_path, kwargs\u001b[38;5;241m.\u001b[39mget(\u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mdevice\u001b[39m\u001b[38;5;124m\"\u001b[39m, \u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mCPU\u001b[39m\u001b[38;5;124m\"\u001b[39m), kwargs\u001b[38;5;241m.\u001b[39mget(\u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mov_config\u001b[39m\u001b[38;5;124m\"\u001b[39m), model_cache_path\u001b[38;5;241m.\u001b[39mparent\n\u001b[0;32m    863\u001b[0m     )\n",
-      "File \u001b[1;32m~\\ovraglangchain\\lib\\site-packages\\optimum\\intel\\openvino\\modeling_base.py:336\u001b[0m, in \u001b[0;36mOVBaseModel.load_model\u001b[1;34m(file_name, quantization_config)\u001b[0m\n\u001b[0;32m    333\u001b[0m \u001b[38;5;28;01mif\u001b[39;00m \u001b[38;5;28misinstance\u001b[39m(file_name, \u001b[38;5;28mstr\u001b[39m):\n\u001b[0;32m    334\u001b[0m     file_name \u001b[38;5;241m=\u001b[39m Path(file_name)\n\u001b[0;32m    335\u001b[0m model \u001b[38;5;241m=\u001b[39m (\n\u001b[1;32m--> 336\u001b[0m     \u001b[43mcore\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mread_model\u001b[49m\u001b[43m(\u001b[49m\u001b[43mfile_name\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mresolve\u001b[49m\u001b[43m(\u001b[49m\u001b[43m)\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43mfile_name\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mwith_suffix\u001b[49m\u001b[43m(\u001b[49m\u001b[38;5;124;43m\"\u001b[39;49m\u001b[38;5;124;43m.bin\u001b[39;49m\u001b[38;5;124;43m\"\u001b[39;49m\u001b[43m)\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mresolve\u001b[49m\u001b[43m(\u001b[49m\u001b[43m)\u001b[49m\u001b[43m)\u001b[49m\n\u001b[0;32m    337\u001b[0m     \u001b[38;5;28;01mif\u001b[39;00m \u001b[38;5;129;01mnot\u001b[39;00m file_name\u001b[38;5;241m.\u001b[39msuffix \u001b[38;5;241m==\u001b[39m \u001b[38;5;124m\"\u001b[39m\u001b[38;5;124m.onnx\u001b[39m\u001b[38;5;124m\"\u001b[39m\n\u001b[0;32m    338\u001b[0m     \u001b[38;5;28;01melse\u001b[39;00m convert_model(file_name)\n\u001b[0;32m    339\u001b[0m )\n\u001b[0;32m    340\u001b[0m \u001b[38;5;28;01mif\u001b[39;00m file_name\u001b[38;5;241m.\u001b[39msuffix \u001b[38;5;241m==\u001b[39m \u001b[38;5;124m\"\u001b[39m\u001b[38;5;124m.onnx\u001b[39m\u001b[38;5;124m\"\u001b[39m:\n\u001b[0;32m    341\u001b[0m     model \u001b[38;5;241m=\u001b[39m fix_op_names_duplicates(model)  \u001b[38;5;66;03m# should be called during model conversion to IR\u001b[39;00m\n",
-      "File \u001b[1;32m~\\ovraglangchain\\lib\\site-packages\\openvino\\_ov_api.py:603\u001b[0m, in \u001b[0;36mCore.read_model\u001b[1;34m(self, model, weights, config)\u001b[0m\n\u001b[0;32m    601\u001b[0m     \u001b[38;5;28;01mreturn\u001b[39;00m Model(\u001b[38;5;28msuper\u001b[39m()\u001b[38;5;241m.\u001b[39mread_model(model, config\u001b[38;5;241m=\u001b[39mconfig))\n\u001b[0;32m    602\u001b[0m \u001b[38;5;28;01melse\u001b[39;00m:\n\u001b[1;32m--> 603\u001b[0m     \u001b[38;5;28;01mreturn\u001b[39;00m Model(\u001b[38;5;28;43msuper\u001b[39;49m\u001b[43m(\u001b[49m\u001b[43m)\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mread_model\u001b[49m\u001b[43m(\u001b[49m\u001b[43mmodel\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43mweights\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43mconfig\u001b[49m\u001b[43m)\u001b[49m)\n",
-      "\u001b[1;31mRuntimeError\u001b[0m: Exception from src\\inference\\src\\cpp\\core.cpp:97:\nCheck 'false' failed at src\\frontends\\common\\src\\frontend.cpp:54:\nConverting input model\nstoll argument out of range\n\n"
-     ]
-    }
-   ],
-   "source": [
-    "# Initialize the evaluator (this might take a moment on the first run)\n",
-    "evaluator = OpenVINORAGEvaluator(device=\"GPU\")\n",
-    "\n",
-    "# Prepare the data for evaluation\n",
-    "response_text = answer\n",
-    "reference_text = context\n",
-    "\n",
-    "# Get all evaluation metrics\n",
-    "metrics = evaluator.evaluate_all(response_text, reference_text)"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 17,
-   "id": "8006c91c-1180-41c8-b04e-448e4131391f",
-   "metadata": {},
-   "outputs": [
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "--- Evaluation Metrics ---\n"
-     ]
-    },
-    {
-     "ename": "NameError",
-     "evalue": "name 'metrics' is not defined",
-     "output_type": "error",
-     "traceback": [
-      "\u001b[1;31m---------------------------------------------------------------------------\u001b[0m",
-      "\u001b[1;31mNameError\u001b[0m                                 Traceback (most recent call last)",
-      "Cell \u001b[1;32mIn[17], line 2\u001b[0m\n\u001b[0;32m      1\u001b[0m \u001b[38;5;28mprint\u001b[39m(\u001b[38;5;124m\"\u001b[39m\u001b[38;5;124m--- Evaluation Metrics ---\u001b[39m\u001b[38;5;124m\"\u001b[39m)\n\u001b[1;32m----> 2\u001b[0m \u001b[38;5;28;01mfor\u001b[39;00m metric, value \u001b[38;5;129;01min\u001b[39;00m \u001b[43mmetrics\u001b[49m\u001b[38;5;241m.\u001b[39mitems():\n\u001b[0;32m      3\u001b[0m     \u001b[38;5;28mprint\u001b[39m(\u001b[38;5;124mf\u001b[39m\u001b[38;5;124m\"\u001b[39m\u001b[38;5;132;01m{\u001b[39;00mmetric\u001b[38;5;132;01m}\u001b[39;00m\u001b[38;5;124m: \u001b[39m\u001b[38;5;132;01m{\u001b[39;00mvalue\u001b[38;5;132;01m:\u001b[39;00m\u001b[38;5;124m.4f\u001b[39m\u001b[38;5;132;01m}\u001b[39;00m\u001b[38;5;124m\"\u001b[39m)\n",
-      "\u001b[1;31mNameError\u001b[0m: name 'metrics' is not defined"
-     ]
-    }
-   ],
-   "source": [
-    "print(\"--- Evaluation Metrics ---\")\n",
-    "for metric, value in metrics.items():\n",
-    "    print(f\"{metric}: {value:.4f}\")"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "f53cbb39-9967-4e4e-8e1d-588bf2aee390",
-   "metadata": {},
-   "outputs": [],
-   "source": []
-  }
- ],
- "metadata": {
-  "kernelspec": {
-   "display_name": "Python 3 (ipykernel)",
-   "language": "python",
-   "name": "python3"
-  },
-  "language_info": {
-   "codemirror_mode": {
-    "name": "ipython",
-    "version": 3
-   },
-   "file_extension": ".py",
-   "mimetype": "text/x-python",
-   "name": "python",
-   "nbconvert_exporter": "python",
-   "pygments_lexer": "ipython3",
-   "version": "3.10.11"
-  }
- },
- "nbformat": 4,
- "nbformat_minor": 5
-}

From 8d80af5885aed27881936944cbc20b9005a3d167 Mon Sep 17 00:00:00 2001
From: pkhara31 <112378664+pkhara31@users.noreply.github.com>
Date: Mon, 8 Dec 2025 14:41:42 +0530
Subject: [PATCH 5/5] Add files via upload

---
 .../ov_rag_evaluator.ipynb                    | 763 ++++++++++++++++++
 1 file changed, 763 insertions(+)
 create mode 100644 notebooks/llm-rag-ov-langchain/ov_rag_evaluator.ipynb

diff --git a/notebooks/llm-rag-ov-langchain/ov_rag_evaluator.ipynb b/notebooks/llm-rag-ov-langchain/ov_rag_evaluator.ipynb
new file mode 100644
index 00000000000..a71e6ac6a89
--- /dev/null
+++ b/notebooks/llm-rag-ov-langchain/ov_rag_evaluator.ipynb
@@ -0,0 +1,763 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "id": "7722a495",
+   "metadata": {},
+   "source": [
+    "# RAG Performance & Fairness Evaluation Toolkit (OpenVINO + LangChain)\n",
+    "\n",
+    "This notebook demonstrates how to build and evaluate a Retrieval-Augmented Generation (RAG) pipeline using OpenVINO™ for accelerated performance on Intel hardware. We will use Hugging Face and LangChain libraries to construct the pipeline.\n",
+    "\n",
+    "The process involves:\n",
+    "1.  **Environment Setup**: Installing necessary libraries.\n",
+    "2.  **LLM and Tokenizer Setup**: Loading a language model (Microsoft's Phi-3-mini) and its tokenizer, optimized with OpenVINO.\n",
+    "3.  **Embedding Model Setup**: Preparing an embedding model to convert text into vector representations.\n",
+    "4.  **Data Loading and Processing**: Fetching documents from a web source, splitting them into manageable chunks, and creating vector embeddings.\n",
+    "5.  **Vector Store and Retriever Setup**: Storing the embeddings in a ChromaDB vector store and setting up a retriever with reranking for improved accuracy.\n",
+    "6.  **Building the RAG Chain**: Creating a `RetrievalQA` chain that combines the retriever and the LLM.\n",
+    "7.  **Running the RAG Pipeline**: Asking a question to get a response from the RAG system.\n",
+    "8.  **Evaluation**: Using a comprehensive `OpenVINORAGEvaluator` to assess the quality of the generated response based on various metrics like BLEU, ROUGE, BERTScore, perplexity, and bias."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "81a21a14",
+   "metadata": {},
+   "source": [
+    "## 1. Environment Setup\n",
+    "\n",
+    "First, let's ensure all the required Python packages are installed. The following commands handle the installation of essential libraries. These are typically only needed if you encounter version conflicts or issues with existing installations."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "c4a2dc6a-3d3e-4da2-902f-30f3cbd24b39",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import os\n",
+    "import requests\n",
+    "from pathlib import Path\n",
+    "\n",
+    "if not Path(\"notebook_utils.py\").exists():\n",
+    "    r = requests.get(\n",
+    "        url=\"https://raw.githubusercontent.com/openvinotoolkit/openvino_notebooks/latest/utils/notebook_utils.py\",\n",
+    "    )\n",
+    "    with open(\"notebook_utils.py\", \"w\") as f:\n",
+    "        f.write(r.text)\n",
+    "\n",
+    "if not Path(\"pip_helper.py\").exists():\n",
+    "    r = requests.get(\n",
+    "        url=\"https://raw.githubusercontent.com/openvinotoolkit/openvino_notebooks/latest/utils/pip_helper.py\",\n",
+    "    )\n",
+    "    open(\"pip_helper.py\", \"w\").write(r.text)\n",
+    "\n",
+    "from pip_helper import pip_install\n",
+    "\n",
+    "os.environ[\"GIT_CLONE_PROTECTION_ACTIVE\"] = \"false\"\n",
+    "\n",
+    "pip_install(\"--pre\", \"-U\", \"openvino>=2025.3.0\", \"--extra-index-url\", \"https://storage.openvinotoolkit.org/simple/wheels/nightly\")\n",
+    "pip_install(\"--pre\", \"-U\", \"openvino-tokenizers\", \"--extra-index-url\", \"https://storage.openvinotoolkit.org/simple/wheels/nightly\")\n",
+    "pip_install(\n",
+    "    \"--extra-index-url\",\n",
+    "    \"https://download.pytorch.org/whl/cpu\",\n",
+    "    \"--upgrade-strategy\",\n",
+    "    \"eager\",\n",
+    "    \"optimum[openvino,nncf,onnxruntime]\",\n",
+    "    \"sacrebleu\",\n",
+    "    \"rouge-score\",\n",
+    "    \"nncf>=2.18.0\",\n",
+    "    \"bert-score\",\n",
+    "    \"transformers\",\n",
+    "    \"onnx\",\n",
+    "    \"nltk\",\n",
+    "    \"numpy\",\n",
+    "    \"textblob\",\n",
+    "    \"dataset\",\n",
+    "    \"langchain\",\n",
+    "    \"langchain_community\",\n",
+    "    \"chromadb\",\n",
+    "    \"langchain-chroma\",\n",
+    "    \"langchain-huggingface\",\n",
+    "    \"sentence-transformers\",\n",
+    "    \"Flashrank\",\n",
+    "    \"msoffcrypto-tool\",\n",
+    "    \"docx2txt\",\n",
+    "    \"bs4\",\n",
+    "    \"python-docx\",\n",
+    "    \"huggingface-hub>=0.26.5\",\n",
+    ")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "8a005fb2",
+   "metadata": {},
+   "source": [
+    "## 2. LLM and Tokenizer Setup\n",
+    "\n",
+    "Next, we load the Large Language Model (LLM) and its corresponding tokenizer. We use `optimum-intel` to convert and accelerate the model with OpenVINO. In this example, we use `microsoft/Phi-3-mini-4k-instruct`, but you can replace it with another compatible model.\n",
+    "\n",
+    "- **`OVModelForCausalLM`**: Loads a causal language model and automatically converts it to the OpenVINO format (`export=True`).\n",
+    "- **`device=\"GPU\"`**: Specifies that the model should run on the integrated GPU for acceleration. You can change this to `\"CPU\"`."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "90e68d95-9a4e-4ba5-9040-4422c1333444",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from optimum.intel import OVModelForCausalLM\n",
+    "from transformers import AutoTokenizer, pipeline\n",
+    "from langchain_huggingface import HuggingFacePipeline\n",
+    "\n",
+    "# Load model with OpenVINO backend\n",
+    "model = OVModelForCausalLM.from_pretrained(\n",
+    "    \"microsoft/Phi-3-mini-4k-instruct\", # You can plug in any other supported model\n",
+    "    export=True,  # Convert to OpenVINO format on the fly\n",
+    "    device=\"GPU\"  # Specify GPU for inference, can also be \"CPU\"\n",
+    ")\n",
+    "\n",
+    "tokenizer = AutoTokenizer.from_pretrained(\"microsoft/Phi-3-mini-4k-instruct\")\n",
+    "model.save_pretrained(\"ov_model\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "27419145",
+   "metadata": {},
+   "source": [
+    "### Create a LangChain-compatible LLM Pipeline\n",
+    "\n",
+    "We now create a `text-generation` pipeline using the OpenVINO-optimized model and tokenizer. This pipeline is then wrapped in `HuggingFacePipeline` to make it compatible with the LangChain ecosystem. A quick test is run to confirm the pipeline is working correctly."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "f985ca28-9e3d-490d-954c-71b24fc47eda",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Create a text-generation pipeline with the OpenVINO model\n",
+    "llm_pipeline = pipeline(\n",
+    "    \"text-generation\",\n",
+    "    model=model,\n",
+    "    tokenizer=tokenizer,\n",
+    "    device=model.device,\n",
+    "    max_new_tokens=100,\n",
+    "    top_k=50,\n",
+    "    temperature=0.1,\n",
+    "    do_sample=True\n",
+    ")\n",
+    "\n",
+    "# Create a LangChain instance from the Hugging Face pipeline\n",
+    "llm = HuggingFacePipeline(pipeline=llm_pipeline)\n",
+    "\n",
+    "# Test the pipeline with a sample query\n",
+    "response = llm.invoke(\"What is an ocean?\")\n",
+    "print(response)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "05fee9cc",
+   "metadata": {},
+   "source": [
+    "## 3. Embedding Model Setup\n",
+    "\n",
+    "For the retrieval part of our RAG pipeline, we need an embedding model to convert text documents into numerical vectors. We use `OpenVINOBgeEmbeddings` from `langchain_community`, which provides OpenVINO-optimized embeddings for efficient performance. Here, we use the `bge-small-en-v1.5` model."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "606ff70a-f797-42a5-a697-8cb5c13c0dae",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from langchain_community.embeddings import OpenVINOBgeEmbeddings\n",
+    "from sentence_transformers import SentenceTransformer\n",
+    "import os\n",
+    "\n",
+    "# First time: Download and save the model\n",
+    "embedding_model_name = \"BAAI/bge-small-en-v1.5\"  # Full HF repo path\n",
+    "save_directory = \"./saved_bge_model\"\n",
+    "\n",
+    "# Download the model using SentenceTransformer directly\n",
+    "st_model = SentenceTransformer(embedding_model_name)\n",
+    "st_model.save(save_directory)\n",
+    "print(f\"Model saved to {save_directory}\")\n",
+    "\n",
+    "# Now create the OpenVINO embedding with the saved model\n",
+    "embedding = OpenVINOBgeEmbeddings(\n",
+    "    model_name_or_path=save_directory,  # Use saved path\n",
+    "    model_kwargs={\"device\": \"CPU\"},\n",
+    "    encode_kwargs={\"normalize_embeddings\": True},\n",
+    ")\n",
+    "\n",
+    "# Load the saved model from local directory\n",
+    "local_model_path = \"./saved_bge_model\"\n",
+    "\n",
+    "embedding = OpenVINOBgeEmbeddings(\n",
+    "    model_name_or_path=local_model_path,\n",
+    "    model_kwargs={\"device\": \"CPU\"},\n",
+    "    encode_kwargs={\"normalize_embeddings\": True},\n",
+    ")\n",
+    "\n",
+    "# Test the loaded model\n",
+    "text = \"This is a test document.\"\n",
+    "embedding_result = embedding.embed_query(text)\n",
+    "print(\"Sample embedding (first 3 dimensions):\", embedding_result[:3])"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "f8defdcb",
+   "metadata": {},
+   "source": [
+    "## 4. Data Loading and Processing\n",
+    "\n",
+    "Now we'll load the documents that will form the knowledge base for our RAG pipeline. This notebook includes two methods for loading documents:\n",
+    "\n",
+    "1.  **Web Crawling (Enabled by default)**: Fetches content from a website's sitemap. We use `WebBaseLoader` to load content from URLs found in the sitemap of Zerodha Varsity.\n",
+    "2.  **Local File Loading (Commented out)**: A robust `LangChainDocumentLoader` class is provided to load various file types (`.txt`, `.pdf`, `.docx`, etc.) from a local directory. You can uncomment and adapt this section if you want to use your own local files."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "808e9c2d-ab4a-4bc6-bb45-f3b2d4be3156",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import bs4\n",
+    "from urllib.request import Request, urlopen\n",
+    "from bs4 import BeautifulSoup\n",
+    "import ssl\n",
+    "from langchain_community.document_loaders import WebBaseLoader\n",
+    "'''\n",
+    "# --- Method 1: Load documents by crawling a web page (default) ---\n",
+    "def get_sitemap(url):\n",
+    "    \"\"\"Fetches and parses an XML sitemap from a URL.\"\"\"\n",
+    "    req = Request(url, headers={\"User-Agent\": \"Mozilla/5.0\"})\n",
+    "    response = urlopen(req)\n",
+    "    xml = BeautifulSoup(response, \"lxml-xml\", from_encoding=response.info().get_param(\"charset\"))\n",
+    "    return xml\n",
+    "\n",
+    "def get_urls_from_sitemap(xml):\n",
+    "    \"\"\"Extracts all URLs from a parsed sitemap XML.\"\"\"\n",
+    "    urls = [loc.text for loc in xml.find_all(\"loc\")]\n",
+    "    return urls\n",
+    "\n",
+    "# Bypass SSL verification issues if they arise\n",
+    "ssl._create_default_https_context = ssl._create_stdlib_context\n",
+    "\n",
+    "sitemap_url = \"https://zerodha.com/varsity/chapter-sitemap2.xml\"\n",
+    "sitemap_xml = get_sitemap(sitemap_url)\n",
+    "urls = get_urls_from_sitemap(sitemap_xml)\n",
+    "\n",
+    "# Load documents from the collected URLs\n",
+    "docs = []\n",
+    "for i, url in enumerate(urls):\n",
+    "    try:\n",
+    "        loader = WebBaseLoader(url)\n",
+    "        docs.extend(loader.load())\n",
+    "        if (i + 1) % 10 == 0:\n",
+    "            print(f\"Loaded {i + 1}/{len(urls)} URLs\")\n",
+    "    except Exception as e:\n",
+    "        print(f\"Failed to load {url}: {e}\")\n",
+    "\n",
+    "print(f\"\\nTotal documents loaded: {len(docs)}\")\n",
+    "'''\n",
+    "# --- Method 2: Load documents locally from the system (commented out) ---\n",
+    "\n",
+    "import os\n",
+    "from langchain.document_loaders import (\n",
+    "    TextLoader,\n",
+    "    PyPDFLoader,\n",
+    "    DirectoryLoader,\n",
+    ")\n",
+    "from langchain.schema import Document as LCDocument\n",
+    "from typing import List\n",
+    "\n",
+    "class LocalDocumentLoader:\n",
+    "    \"\"\"Load documents from a local directory using LangChain loaders.\"\"\"\n",
+    "    def __init__(self, directory_path: str):\n",
+    "        self.directory_path = directory_path\n",
+    "\n",
+    "    def load(self) -> List[LCDocument]:\n",
+    "        \"\"\"Loads all supported documents from the directory.\"\"\"\n",
+    "        if not self.directory_path:\n",
+    "            raise ValueError(\"Directory path not set.\")\n",
+    "\n",
+    "        # Define loaders for different file types\n",
+    "        txt_loader = DirectoryLoader(\n",
+    "            self.directory_path, glob=\"**/*.txt\", loader_cls=TextLoader,\n",
+    "            loader_kwargs={\"encoding\": \"utf-8\"}, show_progress=True\n",
+    "        )\n",
+    "        pdf_loader = DirectoryLoader(\n",
+    "            self.directory_path, glob=\"**/*.pdf\", loader_cls=PyPDFLoader, show_progress=True\n",
+    "        )\n",
+    "\n",
+    "        documents = []\n",
+    "        documents.extend(txt_loader.load())\n",
+    "        documents.extend(pdf_loader.load())\n",
+    "        \n",
+    "        return documents\n",
+    "\n",
+    "#Usage Example:\n",
+    "loader = LocalDocumentLoader(directory_path=\"content\")\n",
+    "docs = loader.load()\n",
+    "print(f\"Loaded {len(docs)} local documents.\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "a6b107e7",
+   "metadata": {},
+   "source": [
+    "### Split Documents into Chunks\n",
+    "\n",
+    "LLMs have a limited context window, so we need to split large documents into smaller chunks. This ensures that the model can process the retrieved information effectively. We use `RecursiveCharacterTextSplitter` which is a smart way to split text while trying to keep related content together."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "51d07ec4-b929-4893-baff-af68a4fbf3aa",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from langchain.text_splitter import RecursiveCharacterTextSplitter\n",
+    "\n",
+    "# Split the documents into smaller chunks with a specified size and overlap\n",
+    "text_splitter = RecursiveCharacterTextSplitter(\n",
+    "    chunk_size=1250,\n",
+    "    chunk_overlap=100,\n",
+    "    length_function=len,\n",
+    "    is_separator_regex=False\n",
+    ")\n",
+    "\n",
+    "split_docs = text_splitter.split_documents(docs)\n",
+    "print(f\"Documents split into {len(split_docs)} chunks.\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "1a734d8c",
+   "metadata": {},
+   "source": [
+    "## 5. Vector Store and Retriever Setup\n",
+    "\n",
+    "Now we'll create a vector store to house the document embeddings and enable efficient similarity searches.\n",
+    "\n",
+    "- **`Chroma`**: We use ChromaDB as our vector store. It's a lightweight and easy-to-use vector database.\n",
+    "- **`persist_directory`**: This saves the created database to disk, allowing us to reuse it later without re-processing the documents."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "194acf26-f710-483d-97d6-57bfff7cfa65",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Create a ChromaDB instance to store the document embeddings\n",
+    "from langchain_community.vectorstores import Chroma\n",
+    "\n",
+    "vectorstore = Chroma(\n",
+    "    embedding_function=embedding,\n",
+    "    persist_directory=\"./chromadb_varsity\",\n",
+    "    collection_name=\"zerodha_varsity_docs\"\n",
+    ")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "0690e134",
+   "metadata": {},
+   "source": [
+    "### Add Documents to the Vector Store\n",
+    "\n",
+    "We add the processed document chunks to the vector store. To handle a large number of documents efficiently, we add them in batches. The metadata is also filtered to ensure compatibility with the vector store."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "08f6ce04-9702-4372-be96-6fc34431fc21",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from langchain_community.vectorstores.utils import filter_complex_metadata\n",
+    "\n",
+    "# Function to insert embeddings in batches for a lengthy document set\n",
+    "def add_documents_in_batches(vectorstore, docs, batch_size=100):\n",
+    "    \"\"\"Adds documents to the vectorstore in batches.\"\"\"\n",
+    "    for i in range(0, len(docs), batch_size):\n",
+    "        chunk = docs[i : i + batch_size]\n",
+    "        vectorstore.add_documents(chunk)\n",
+    "        print(f\"Added batch {i//batch_size + 1}/{(len(docs)-1)//batch_size + 1}\")\n",
+    "    # Persist the database to disk if the method is available\n",
+    "    if hasattr(vectorstore, \"persist\"):\n",
+    "        vectorstore.persist()\n",
+    "\n",
+    "# Filter out complex metadata that might cause issues\n",
+    "filtered_docs = filter_complex_metadata(split_docs)\n",
+    "\n",
+    "# Add the documents to the vector store in batches\n",
+    "add_documents_in_batches(vectorstore, filtered_docs)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "4af0346d",
+   "metadata": {},
+   "source": [
+    "### Set up a Reranking Retriever\n",
+    "\n",
+    "To improve the quality of retrieved documents, we use a reranker. The initial retriever fetches a set of documents (e.g., k=5), and the reranker (`FlashrankRerank`) re-orders them based on their relevance to the query. This ensures that the most relevant context is passed to the LLM.\n",
+    "\n",
+    "- **`ContextualCompressionRetriever`**: Wraps a base retriever and a document compressor (the reranker) to create this two-stage retrieval process."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "01bc1634-dcea-431c-b447-af5b7d38aaeb",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from langchain.retrievers import ContextualCompressionRetriever\n",
+    "from langchain.retrievers.document_compressors import FlashrankRerank\n",
+    "\n",
+    "# Set up the base retriever to fetch the top 5 documents\n",
+    "retriever = vectorstore.as_retriever(search_kwargs={\"k\": 5})\n",
+    "\n",
+    "# Initialize the reranker\n",
+    "compressor = FlashrankRerank()\n",
+    "\n",
+    "# Create the compression retriever, which combines retrieval and reranking\n",
+    "compression_retriever = ContextualCompressionRetriever(\n",
+    "    base_compressor=compressor,\n",
+    "    base_retriever=retriever\n",
+    ")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "2c9bd8c5",
+   "metadata": {},
+   "source": [
+    "## 6. Building the RAG Chain\n",
+    "\n",
+    "With all the components ready, we now assemble the final RAG pipeline using LangChain's `RetrievalQA` chain. This chain connects the LLM with the retriever.\n",
+    "\n",
+    "- **`chain_type=\"stuff\"`**: This means all retrieved documents will be \"stuffed\" into the prompt sent to the LLM.\n",
+    "- **`return_source_documents=True`**: This is important for evaluation, as it allows us to see which documents were used to generate the answer."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "e18d5187-a6c6-406f-b5b2-f9982d97d3a2",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from langchain.chains import RetrievalQA\n",
+    "\n",
+    "qa_chain = RetrievalQA.from_chain_type(\n",
+    "    llm=llm,\n",
+    "    chain_type=\"stuff\",\n",
+    "    retriever=compression_retriever,\n",
+    "    return_source_documents=True\n",
+    ")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "db9453a2",
+   "metadata": {},
+   "source": [
+    "## 7. Running the RAG Pipeline\n",
+    "\n",
+    "It's time to ask a question! The `qa_chain.invoke` method will execute the full RAG process: retrieve relevant documents, pass them to the LLM along with the question, and return the final answer."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "fc8def11-554b-4bd1-ab37-9824f003966e",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "question = \"What is deep link?\"\n",
+    "result = qa_chain.invoke({\"query\": question})\n",
+    "print(\"--- Question ---\")\n",
+    "print(question)\n",
+    "print(\"\\n--- Answer ---\")\n",
+    "print(result[\"result\"])"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "b30815a2",
+   "metadata": {},
+   "source": [
+    "### Extract Answer and Context for Evaluation\n",
+    "\n",
+    "For the evaluation step, we need to isolate the generated answer and the source documents (the context or \"reference\")."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "991a94fc-b7b3-4709-896b-c613e1b857b8",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "answer = result['result']\n",
+    "context = \" \".join([d.page_content for d in result['source_documents']])"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "bf65db80",
+   "metadata": {},
+   "source": [
+    "## 8. Evaluation\n",
+    "\n",
+    "To assess the quality of our RAG pipeline, we use a custom `OpenVINORAGEvaluator` class. This class uses OpenVINO-optimized models to calculate several key metrics:\n",
+    "\n",
+    "- **BLEU & ROUGE**: Measure the overlap between the generated answer and the reference context.\n",
+    "- **BERTScore**: Computes semantic similarity, which is more advanced than simple overlap.\n",
+    "- **Perplexity**: Measures how well a language model (here, Llama-2-7B) predicts the generated text. Lower is better.\n",
+    "- **Diversity**: Calculates the variety of tokens in the response.\n",
+    "- **Racial Bias**: Uses a hate speech detection model to check for biased content.\n",
+    "\n",
+    "**Note**: The first time you run this, it will download and convert the necessary evaluation models (Llama-2-7B and a hate speech model) to the OpenVINO format. This is a one-time setup."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "1a8122e4-6602-4750-ad6e-c5cc599e0b0a",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import openvino as ov\n",
+    "import numpy as np\n",
+    "import torch\n",
+    "from transformers import AutoTokenizer, AutoModelForCausalLM, AutoModelForSequenceClassification\n",
+    "from optimum.intel import OVModelForCausalLM, OVModelForSequenceClassification\n",
+    "from nltk.translate.bleu_score import corpus_bleu, SmoothingFunction\n",
+    "from rouge_score import rouge_scorer\n",
+    "from bert_score import score\n",
+    "from nltk.util import ngrams\n",
+    "from typing import List\n",
+    "import os\n",
+    "\n",
+    "class OpenVINORAGEvaluator:\n",
+    "    \"\"\"An evaluator for RAG pipelines using OpenVINO-optimized models.\"\"\"\n",
+    "    \n",
+    "    def __init__(self, device=\"GPU\", models_dir=\"./openvino_models\"):\n",
+    "        self.device = device\n",
+    "        self.models_dir = models_dir\n",
+    "        os.makedirs(self.models_dir, exist_ok=True)\n",
+    "        \n",
+    "        # Initialize models and tokenizers for evaluation\n",
+    "        self.llama2_model, self.llama2_tokenizer = self._load_model(\n",
+    "            model_id=\"meta-llama/Llama-2-7b-hf\",\n",
+    "            ov_model_class=OVModelForCausalLM,\n",
+    "            subfolder=\"llama2-7b-openvino\"\n",
+    "        )\n",
+    "        self.bias_model, self.bias_tokenizer = self._load_model(\n",
+    "            model_id=\"Hate-speech-CNERG/dehatebert-mono-english\",\n",
+    "            ov_model_class=OVModelForSequenceClassification,\n",
+    "            subfolder=\"hate-speech-openvino\"\n",
+    "        )\n",
+    "        print(f\"OpenVINO RAG Evaluator initialized on {device}\")\n",
+    "\n",
+    "    def _load_model(self, model_id, ov_model_class, subfolder):\n",
+    "        \"\"\"Generic function to load or convert a model to OpenVINO format.\"\"\"\n",
+    "        model_path = os.path.join(self.models_dir, subfolder)\n",
+    "        \n",
+    "        if not os.path.exists(os.path.join(model_path, \"openvino_model.xml\")):\n",
+    "            print(f\"Converting {model_id} to OpenVINO format...\")\n",
+    "            ov_model = ov_model_class.from_pretrained(model_id, export=True, compile=False)\n",
+    "            ov_model.save_pretrained(model_path)\n",
+    "            print(f\"Model saved to {model_path}\")\n",
+    "        \n",
+    "        try:\n",
+    "            print(f\"Loading {model_id} from {model_path}...\")\n",
+    "            model = ov_model_class.from_pretrained(model_path, device=self.device)\n",
+    "            tokenizer = AutoTokenizer.from_pretrained(model_id)\n",
+    "            print(f\"{model_id} loaded successfully.\")\n",
+    "            return model, tokenizer\n",
+    "        except Exception as e:\n",
+    "            print(f\"Error loading {model_id}: {e}\")\n",
+    "            return None, None\n",
+    "\n",
+    "    def evaluate_bleu_rouge(self, candidates: List[str], references: List[str]):\n",
+    "        \"\"\"Calculates BLEU and ROUGE scores.\"\"\"\n",
+    "        candidate_tokens = [c.split() for c in candidates]\n",
+    "        reference_tokens = [[r.split()] for r in references]\n",
+    "        \n",
+    "        # BLEU with smoothing\n",
+    "        smoothing = SmoothingFunction().method1\n",
+    "        bleu_score = corpus_bleu(reference_tokens, candidate_tokens, smoothing_function=smoothing)\n",
+    "        \n",
+    "        # ROUGE\n",
+    "        scorer = rouge_scorer.RougeScorer(['rouge1', 'rougeL'], use_stemmer=True)\n",
+    "        rouge1_f1 = sum(scorer.score(ref, cand)['rouge1'].fmeasure for ref, cand in zip(references, candidates)) / len(candidates)\n",
+    "        return bleu_score, rouge1_f1\n",
+    "\n",
+    "    def evaluate_bert_score(self, candidates: List[str], references: List[str]):\n",
+    "        \"\"\"Calculates BERTScore.\"\"\"\n",
+    "        _, _, f1 = score(candidates, references, lang=\"en\", model_type='bert-base-multilingual-cased')\n",
+    "        return f1.mean().item()\n",
+    "\n",
+    "    def evaluate_perplexity(self, text: str):\n",
+    "        \"\"\"Calculates perplexity using the loaded Llama-2 model.\"\"\"\n",
+    "        if not self.llama2_model:\n",
+    "            return float('inf')\n",
+    "        \n",
+    "        try:\n",
+    "            encodings = self.llama2_tokenizer(text, return_tensors='pt', max_length=1024, truncation=True)\n",
+    "            input_ids = encodings.input_ids\n",
+    "            \n",
+    "            with torch.no_grad():\n",
+    "                outputs = self.llama2_model(input_ids)\n",
+    "                logits = outputs.logits\n",
+    "                \n",
+    "                # Manually calculate cross-entropy loss\n",
+    "                # Shift logits and labels for next-token prediction\n",
+    "                shift_logits = logits[..., :-1, :].contiguous()\n",
+    "                shift_labels = input_ids[..., 1:].contiguous()\n",
+    "                \n",
+    "                # Calculate loss\n",
+    "                loss_fct = torch.nn.CrossEntropyLoss()\n",
+    "                loss = loss_fct(shift_logits.view(-1, shift_logits.size(-1)), shift_labels.view(-1))\n",
+    "                perplexity = torch.exp(loss)\n",
+    "            \n",
+    "            return perplexity.item()\n",
+    "        except Exception as e:\n",
+    "            print(f\"Error calculating perplexity: {e}\")\n",
+    "            return float('inf')\n",
+    "\n",
+    "    def evaluate_racial_bias(self, text: str):\n",
+    "        \"\"\"Evaluates racial bias using a hate speech detection model.\"\"\"\n",
+    "        if not self.bias_model:\n",
+    "            return 0.0\n",
+    "\n",
+    "        try:\n",
+    "            inputs = self.bias_tokenizer(text, return_tensors=\"pt\", truncation=True, max_length=512)\n",
+    "            with torch.no_grad():\n",
+    "                logits = self.bias_model(**inputs).logits\n",
+    "                probabilities = torch.nn.functional.softmax(logits, dim=-1)\n",
+    "                # Return the probability of the 'hate speech' class (index 1)\n",
+    "                bias_score = probabilities[0][1].item()\n",
+    "            return bias_score\n",
+    "        except Exception as e:\n",
+    "            print(f\"Error calculating bias: {e}\")\n",
+    "            return 0.0\n",
+    "    \n",
+    "    def evaluate_all(self, response: str, reference: str):\n",
+    "        \"\"\"Runs a comprehensive evaluation and returns all metrics.\"\"\"\n",
+    "        candidates = [response]\n",
+    "        references = [reference]\n",
+    "        \n",
+    "        try:\n",
+    "            bleu, rouge1 = self.evaluate_bleu_rouge(candidates, references)\n",
+    "            bert_f1 = self.evaluate_bert_score(candidates, references)\n",
+    "            perplexity = self.evaluate_perplexity(response)\n",
+    "            racial_bias = self.evaluate_racial_bias(response)\n",
+    "            \n",
+    "            return {\n",
+    "                \"BLEU\": bleu,\n",
+    "                \"ROUGE-1\": rouge1,\n",
+    "                \"BERT F1\": bert_f1,\n",
+    "                \"Perplexity\": perplexity,\n",
+    "                \"Racial Bias\": racial_bias\n",
+    "            }\n",
+    "        except Exception as e:\n",
+    "            print(f\"An error occurred during evaluation: {e}\")\n",
+    "            return {k: 0.0 for k in [\"BLEU\", \"ROUGE-1\", \"BERT F1\", \"Perplexity\", \"Racial Bias\"]}"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "e9c19402",
+   "metadata": {},
+   "source": [
+    "### Run the Evaluation\n",
+    "\n",
+    "Finally, we initialize the `OpenVINORAGEvaluator` and call `evaluate_all` to get a dictionary of scores. This provides a quantitative look at the performance of our RAG pipeline for the given query."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "3dedb93b-20e3-43c3-93a1-38cb3e114019",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Initialize the evaluator (this might take a moment on the first run)\n",
+    "evaluator = OpenVINORAGEvaluator(device=\"GPU\")\n",
+    "\n",
+    "# Prepare the data for evaluation\n",
+    "response_text = answer\n",
+    "reference_text = context\n",
+    "\n",
+    "# Get all evaluation metrics\n",
+    "metrics = evaluator.evaluate_all(response_text, reference_text)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "8006c91c-1180-41c8-b04e-448e4131391f",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "print(\"--- Evaluation Metrics ---\")\n",
+    "for metric, value in metrics.items():\n",
+    "    print(f\"{metric}: {value:.4f}\")"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "f53cbb39-9967-4e4e-8e1d-588bf2aee390",
+   "metadata": {},
+   "outputs": [],
+   "source": []
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 3 (ipykernel)",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.10.11"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 5
+}