NVIDIA-AI-Blueprints · shubhadeepd · Jan 20, 2026 · Jan 16, 2026 · Jan 16, 2026 · Jan 16, 2026
diff --git a/docs/nvidia-rag-mcp.md b/docs/nvidia-rag-mcp.md
diff --git a/examples/README.md b/examples/README.md
@@ -0,0 +1,29 @@
+# NVIDIA RAG Examples
+
+This directory contains example integrations and extensions for NVIDIA RAG.
+
+## Examples
+
+| Example | Description | Documentation |
+|---------|-------------|---------------|
+| [rag_react_agent](./rag_react_agent/) | Integration with [NeMo Agent Toolkit (NAT)](https://github.com/NVIDIA/NeMo-Agent-Toolkit) providing RAG query and search capabilities for agent workflows | [README](./rag_react_agent/README.md) |
+| [nvidia_rag_mcp](./nvidia_rag_mcp/) | MCP (Model Context Protocol) server and client for exposing NVIDIA RAG capabilities to MCP-compatible applications | [Documentation](../docs/nvidia-rag-mcp.md) |
+
+## rag_react_agent
+
+This plugin integrates NVIDIA RAG with [NeMo Agent Toolkit](https://github.com/NVIDIA/NeMo-Agent-Toolkit), enabling intelligent agents to use RAG tools for document retrieval and question answering. It demonstrates:
+
+- Creating custom NAT tools that wrap NVIDIA RAG functionality
+- Using the React Agent workflow for intelligent tool selection
+
+See the [rag_react_agent README](./rag_react_agent/README.md) for setup and usage instructions.
+
+## nvidia_rag_mcp
+
+This example provides an MCP server and client that exposes NVIDIA RAG and Ingestor capabilities as MCP tools. It supports multiple transport modes (SSE, streamable HTTP, stdio) and enables MCP-compatible applications to:
+
+- Generate answers using the RAG pipeline
+- Search the vector database for relevant documents
+- Manage collections and documents in the vector database
+
+See the [MCP documentation](../docs/nvidia-rag-mcp.md) for detailed setup and usage instructions.
diff --git a/nvidia_rag_mcp/__init__.py → examples/nvidia_rag_mcp/__init__.py b/nvidia_rag_mcp/__init__.py → examples/nvidia_rag_mcp/__init__.py
diff --git a/nvidia_rag_mcp/mcp_client.py → examples/nvidia_rag_mcp/mcp_client.py b/nvidia_rag_mcp/mcp_client.py → examples/nvidia_rag_mcp/mcp_client.py
@@ -300,16 +300,16 @@ def main() -> None:
     Main entry point for the MCP client CLI.
     Examples:
       List tools (SSE):
-        python nvidia_rag_mcp/mcp_client.py list --transport=sse --url=http://127.0.0.1:8000/sse
+        python examples/nvidia_rag_mcp/mcp_client.py list --transport=sse --url=http://127.0.0.1:8000/sse
       List tools (stdio):
-        python nvidia_rag_mcp/mcp_client.py list --transport=stdio --command=python \
-          --args="-m nvidia_rag_mcp.mcp_server --transport stdio"
+        python examples/nvidia_rag_mcp/mcp_client.py list --transport=stdio --command=python \
+          --args="examples/nvidia_rag_mcp/mcp_server.py --transport stdio"
       Call generate (streamable_http):
-        python nvidia_rag_mcp/mcp_client.py call --transport=streamable_http --url=http://127.0.0.1:8000/mcp \
+        python examples/nvidia_rag_mcp/mcp_client.py call --transport=streamable_http --url=http://127.0.0.1:8000/mcp \
           --tool=generate --json-args='{"messages":[{"role":"user","content":"Hi"}]}'
       Call upload_documents (stdio):
-        python nvidia_rag_mcp/mcp_client.py call --transport=stdio --command=python \
-          --args="-m nvidia_rag_mcp.mcp_server --transport stdio" \
+        python examples/nvidia_rag_mcp/mcp_client.py call --transport=stdio --command=python \
+          --args="examples/nvidia_rag_mcp/mcp_server.py --transport stdio" \
           --tool=upload_documents \
           --json-args='{"collection_name":"my_collection","file_paths":["/abs/path/file.pdf"]}'
     """

diff --git a/nvidia_rag_mcp/mcp_server.py → examples/nvidia_rag_mcp/mcp_server.py b/nvidia_rag_mcp/mcp_server.py → examples/nvidia_rag_mcp/mcp_server.py
@@ -815,9 +815,9 @@ def main() -> None:
     Main entry point for the MCP server.
     Examples:
       SSE:
-        python nvidia_rag_mcp/mcp_server.py --transport sse
+        python examples/nvidia_rag_mcp/mcp_server.py --transport sse
       streamable_http:
-        python nvidia_rag_mcp/mcp_server.py --transport streamable_http
+        python examples/nvidia_rag_mcp/mcp_server.py --transport streamable_http
     """
     parser = argparse.ArgumentParser(description="NVIDIA RAG MCP server")
     parser.add_argument("--transport", choices=["sse", "streamable_http", "stdio"], help="Transport mode")

diff --git a/nvidia_rag_mcp/requirements.txt → examples/nvidia_rag_mcp/requirements.txt b/nvidia_rag_mcp/requirements.txt → examples/nvidia_rag_mcp/requirements.txt
diff --git a/examples/rag_react_agent/README.md b/examples/rag_react_agent/README.md
@@ -0,0 +1,252 @@
+# Building Agentic RAG with NeMo Agent Toolkit
+
+This example demonstrates how to build intelligent agents that leverage **NVIDIA RAG** capabilities using [NeMo Agent Toolkit (NAT)](https://github.com/NVIDIA/NeMo-Agent-Toolkit). The agent can autonomously decide when and how to query your document knowledge base.
+
+## Overview
+
+This example shows how to:
+
+1. **Expose RAG as agent tools** - Wrap NVIDIA RAG query and search capabilities as tools that agents can use
+2. **Build a ReAct agent** - Use NAT's ReAct workflow to create an agent that reasons about when to use RAG
+
+The ReAct (Reason + Act) agent pattern enables the LLM to iteratively reason about which tools to use based on the user's query, making it ideal for building conversational AI applications with document retrieval capabilities.
+
+## Prerequisites
+
+- Python 3.11+
+- Access to NVIDIA AI endpoints (API key required)
+- **Data ingested into Milvus** - Complete the [rag_library_usage.ipynb](../../notebooks/rag_library_usage.ipynb) notebook to set up Milvus and ingest documents before running this example
+
+## Quick Start
+
+All commands should be run from the `examples/rag_react_agent/` directory.
+
+### 1. Set Environment Variables
+
+```bash
+# Required: NVIDIA API key for embeddings, reranking, and LLM
+export NVIDIA_API_KEY="your-nvidia-api-key"
+
+# Optional: If using custom endpoints
+# export NVIDIA_BASE_URL="https://integrate.api.nvidia.com/v1"
+```
+
+### 2. Configure Vector Database Endpoint
+
+By default, the example connects to Milvus at `http://localhost:19530`. You can configure this in two ways:
+
+**Option A: Environment Variable (takes precedence)**
+
+```bash
+# For standard Milvus server
+# export APP_VECTORSTORE_URL="http://localhost:19530"
+
+# Or for a remote Milvus instance
+# export APP_VECTORSTORE_URL="http://milvus-host:19530"
+```
+
+**Option B: Update config.yml**
+
+Edit `src/rag_react_agent/configs/config.yml` and update the `vdb_endpoint` field:
+
+```yaml
+functions:
+  rag_query:
+    vdb_endpoint: "http://localhost:19530"  # Your Milvus endpoint
+  rag_search:
+    vdb_endpoint: "http://localhost:19530"  # Your Milvus endpoint
+```
+
+> **Note**: If you have followed rag_library_lite_usage.ipynb notebook and have a setup using milvus-lite, provide an absolute path to the `.db` file (e.g., `/home/user/data/milvus.db`).
+
+### 3. Install Dependencies and Run the Agent
+
+```bash
+# From examples/rag_react_agent/ directory
+# Install all dependencies including nvidia-rag and NeMo Agent Toolkit
+uv sync
+
+# Activate the virtual environment
+source .venv/bin/activate
+```
+
+## Usage
+
+### Running the RAG Agent
+
+The example uses NAT's **ReAct Agent** workflow, which enables the LLM to reason about which RAG tools to use based on the user's query.
+
+```bash
+# From examples/rag_react_agent/ directory with .venv activated
+nat run --config_file src/rag_react_agent/configs/config.yml --input "what is giraffe doing?"
+```
+
+### Example Queries
+
+Try different queries to see how the agent decides which tool to use:
+
+```bash
+# Query that triggers rag_query (generates a response using retrieved documents)
+nat run --config_file src/rag_react_agent/configs/config.yml --input "Summarize the main themes of the documents"
+
+# Query that triggers rag_search (returns relevant document chunks)
+nat run --config_file src/rag_react_agent/configs/config.yml --input "Find all animals mentioned in documents"
+```
+
+### Expected Output
+
+When running successfully, you'll see the agent's reasoning process:
+
+```
+Configuration Summary:
+--------------------
+Workflow Type: react_agent
+Number of Functions: 3
+Number of LLMs: 1
+
+------------------------------
+[AGENT]
+Agent input: what is giraffe doing?
+Agent's thoughts: 
+Thought: I don't have any information about what giraffe is doing. 
+
+Action: rag_query 
+Action Input: {'query': 'giraffe current activity'}
+------------------------------
+
+------------------------------
+[AGENT]
+Calling tools: rag_query
+Tool's input: {'query': 'giraffe current activity'}
+Tool's response: 
+Driving a car at the beach
+------------------------------
+
+------------------------------
+[AGENT]
+Agent input: what is giraffe doing?
+Agent's thoughts: 
+Thought: I now know the final answer 
+Final Answer: Giraffe is driving a car at the beach.
+------------------------------
+
+--------------------------------------------------
+Workflow Result:
+['Giraffe is driving a car at the beach.']
+--------------------------------------------------
+```
+
+## Configuration
+
+The configuration file at `src/rag_react_agent/configs/config.yml` defines the RAG tools and agent workflow:
+
+```yaml
+functions:
+  # RAG Query Tool - Queries documents and returns LLM or VLM Generated response
+  rag_query:
+    _type: nvidia_rag_query
+    # Ensure collection_name matches with the collection name used in the rag library notebook.
+    collection_names: ["test_library"]    # Milvus collection names
+    vdb_endpoint: "http://localhost:19530" # Milvus endpoint URL
+
+  # RAG Search Tool - Searches for relevant document chunks
+  rag_search:
+    _type: nvidia_rag_search
+    collection_names: ["test_library"]
+    vdb_endpoint: "http://localhost:19530"
+    reranker_top_k: 3                     # Number of results after reranking
+    vdb_top_k: 20                         # Number of results from vector search
+
+  # Utility tool for date/time queries
+  current_datetime:
+    _type: current_datetime
+
+llms:
+  nim_llm:
+    _type: nim
+    model_name: meta/llama-3.1-70b-instruct
+    temperature: 0.0
+
+# ReAct Agent workflow - enables the LLM to reason about tool usage
+workflow:
+  _type: react_agent
+  tool_names:
+    - rag_query
+    - rag_search
+    - current_datetime
+  llm_name: nim_llm
+  verbose: true                           # Shows agent reasoning process
+```
+
+### RAG Tools
+
+| Tool | Type | Description |
+|------|------|-------------|
+| `rag_query` | `nvidia_rag_query` | Queries documents and returns an AI-generated response based on retrieved context |
+| `rag_search` | `nvidia_rag_search` | Searches for relevant document chunks without generating a response |
+
+### Tool Configuration Options
+
+#### `nvidia_rag_query`
+
+| Parameter | Description | Default |
+|-----------|-------------|---------|
+| `collection_names` | List of Milvus collection names to query | `[]` |
+| `vdb_endpoint` | Vector database endpoint URL or absolute path to Milvus Lite `.db` file | `"http://localhost:19530"` |
+
+#### `nvidia_rag_search`
+
+| Parameter | Description | Default |
+|-----------|-------------|---------|
+| `collection_names` | List of Milvus collection names to search | `[]` |
+| `vdb_endpoint` | Vector database endpoint URL or absolute path to Milvus Lite `.db` file | `"http://localhost:19530"` |
+| `reranker_top_k` | Number of results to return after reranking | `10` |
+| `vdb_top_k` | Number of results to retrieve before reranking | `100` |
+
+## Troubleshooting
+
+### Error: Function type `nvidia_rag_query` not found
+
+The tools are not registered. Ensure you've installed the package:
+
+```bash
+# From examples/rag_react_agent/ directory
+uv sync
+source .venv/bin/activate
+```
+
+### Error: Token limit exceeded
+
+If you encounter token limit errors, reduce the number of results:
+
+```yaml
+rag_search:
+  _type: nvidia_rag_search
+  reranker_top_k: 1    # Reduce from 3
+  vdb_top_k: 10        # Reduce from 20
+```
+
+This commonly occurs when documents contain large base64-encoded images.
+
+### Error: NVIDIA API key not set
+
+```bash
+export NVIDIA_API_KEY="your-api-key"
+```
+
+### Error: Connection to Milvus failed
+
+Ensure Milvus is running and accessible at the configured endpoint. If you followed the [rag_library_usage.ipynb](../../notebooks/rag_library_usage.ipynb) notebook, Milvus should be running at `http://localhost:19530`.
+
+```bash
+# Check if Milvus is running
+docker ps | grep milvus
+```
+
+## Learn More
+
+- [NeMo Agent Toolkit Documentation](https://docs.nvidia.com/nemo/agent-toolkit/latest/)
+
+## License
+
+Apache-2.0
diff --git a/examples/rag_react_agent/pyproject.toml b/examples/rag_react_agent/pyproject.toml
@@ -0,0 +1,58 @@
+[build-system]
+build-backend = "setuptools.build_meta"
+requires = ["setuptools >= 64", "setuptools-scm>=8"]
+
+
+[tool.setuptools.packages.find]
+where = ["src"]
+include = ["rag_react_agent*"]
+
+
+[tool.setuptools_scm]
+git_describe_command = "git describe --long --first-parent"
+root = "../.."
+
+
+[project]
+name = "rag-react-agent"
+dynamic = ["version"]
+dependencies = [
+  # Keep package version constraints as open as possible to avoid conflicts with other packages. Always define a minimum
+  # version when adding a new package. If unsure, default to using `~=` instead of `==`. Does not apply to nvidia-nat packages.
+  # Keep sorted!!!
+  "langgraph>=0.2",  # Required for react_agent workflow
+  "langchain_classic",
+  "nvidia-nat>=1.5.0a0,<2.0",  # Allow pre-release versions
+  "nvidia-nat-langchain>=1.5.0a0,<2.0",  # Allow pre-release versions
+  "nvidia-rag[rag]~=2.4",
+]
+requires-python = ">=3.11,<3.14"
+description = "RAG React Agent example using NVIDIA RAG with NeMo Agent Toolkit"
+keywords = ["ai", "rag", "agents"]
+license = { text = "Apache-2.0" }
+authors = [{ name = "NVIDIA Corporation" }]
+maintainers = [{ name = "NVIDIA Corporation" }]
+classifiers = [
+  "Programming Language :: Python",
+  "Programming Language :: Python :: 3.11",
+  "Programming Language :: Python :: 3.12",
+  "Programming Language :: Python :: 3.13",
+]
+
+[project.urls]
+documentation = "https://docs.nvidia.com/nemo/agent-toolkit/latest/"
+source = "https://github.com/NVIDIA/NeMo-Agent-Toolkit"
+
+
+[tool.uv]
+managed = true
+config-settings = { editable_mode = "compat" }
+prerelease = "allow"  # nvidia-nat packages are currently pre-release only
+
+
+[tool.uv.sources]
+nvidia-rag = { path = "../..", editable = true }
+
+
+[project.entry-points.'nat.components']
+nat_rag = "rag_react_agent.register"
diff --git a/examples/rag_react_agent/src/rag_react_agent/__init__.py b/examples/rag_react_agent/src/rag_react_agent/__init__.py
@@ -0,0 +1,16 @@
+# SPDX-FileCopyrightText: Copyright (c) 2025-2026, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+# SPDX-License-Identifier: Apache-2.0
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+"""NVIDIA RAG integration for NeMo Agent Toolkit."""