From e731c9712f248e46a12eae2409915a0676d5f615 Mon Sep 17 00:00:00 2001 From: Ofer Mendelevitch Date: Tue, 3 Feb 2026 07:06:27 -0800 Subject: [PATCH 1/6] added example with pandas --- CLAUDE.md | 43 + .../7-lambda-tools-data-analysis.ipynb | 1051 +++++++++++++++++ notebooks/api-examples/README.md | 93 +- 3 files changed, 1182 insertions(+), 5 deletions(-) create mode 100644 CLAUDE.md create mode 100644 notebooks/api-examples/7-lambda-tools-data-analysis.ipynb diff --git a/CLAUDE.md b/CLAUDE.md new file mode 100644 index 0000000..5a2932b --- /dev/null +++ b/CLAUDE.md @@ -0,0 +1,43 @@ +# CLAUDE.md + +This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository. + +## Repository Overview + +This is a collection of Jupyter notebooks demonstrating how to use the Vectara RAG (Retrieval Augmented Generation) platform with various integrations including LangChain, LlamaIndex, and DSPy. The notebooks are designed to run in Google Colab. + +## Key Technologies + +- **Vectara**: RAG-as-a-service platform providing text extraction, ML-based chunking, Boomerang embeddings, hybrid search, and LLM summarization (Mockingbird) +- **LangChain**: Use `langchain-vectara` package for integration +- **LlamaIndex**: Use `llama-index-indices-managed-vectara` package (v0.4.0+ uses API v2) +- **vectara-agentic**: Vectara's agentic RAG package built on LlamaIndex + +## Environment Variables + +Notebooks typically require these environment variables: +- `VECTARA_API_KEY`: Vectara API key +- `VECTARA_CORPUS_KEY`: Vectara corpus identifier +- `OPENAI_API_KEY`: Required for some notebooks that use OpenAI models + +## Running Notebooks + +Notebooks are designed to run in Google Colab. Each notebook includes a Colab badge link at the top. They can also be run locally with Jupyter: + +```bash +pip install jupyter +jupyter notebook notebooks/ +``` + +## Data Files + +Sample data for notebooks is in `data/`: +- PDF files (transformer papers, policy docs, etc.) +- Text files (state_of_the_union.txt, paul_graham_essay.txt) + +## Notebook Patterns + +1. **File upload to Vectara**: Use `add_files()` or `insert_file()` - Vectara handles chunking and embedding +2. **Querying**: Use `as_query_engine()` for retrieval, `as_chat_engine()` for conversational interfaces +3. **Streaming**: Set `streaming=True` in query engine for streamed responses +4. **Reranking options**: MMR (diversity), Slingshot (multilingual), UDF (custom functions), chain reranker diff --git a/notebooks/api-examples/7-lambda-tools-data-analysis.ipynb b/notebooks/api-examples/7-lambda-tools-data-analysis.ipynb new file mode 100644 index 0000000..2092361 --- /dev/null +++ b/notebooks/api-examples/7-lambda-tools-data-analysis.ipynb @@ -0,0 +1,1051 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "id": "cell-0", + "metadata": {}, + "source": [ + "\"Open" + ] + }, + { + "cell_type": "markdown", + "id": "cell-1", + "metadata": {}, + "source": [ + "# Lambda Tools for Data Analysis with NumPy and Pandas\n", + "\n", + "This notebook demonstrates how to create **Lambda Tools** that leverage NumPy and Pandas for data analysis tasks. Lambda tools enable agents to perform computations that would be difficult or impossible with pure LLM reasoning.\n", + "\n", + "You'll learn how to:\n", + "1. Create Lambda tools that use NumPy and Pandas\n", + "2. Pass structured data (JSON/CSV) to tools and receive computed results\n", + "3. Build a Statistical Analyzer for descriptive statistics and correlations\n", + "4. Build a Trend Analyzer for time-series analysis\n", + "5. Combine these tools with an agent for comprehensive data analysis" + ] + }, + { + "cell_type": "markdown", + "id": "cell-2", + "metadata": {}, + "source": [ + "## Why Lambda Tools for Data Analysis?\n", + "\n", + "LLMs are powerful at reasoning and language tasks, but they have limitations:\n", + "\n", + "- **Numerical precision**: LLMs can make arithmetic errors, especially with large datasets\n", + "- **Statistical computations**: Calculating correlations, standard deviations, or regressions requires exact math\n", + "- **Data transformations**: Normalizing, pivoting, or cleaning data needs deterministic operations\n", + "- **Scalability**: Processing thousands of rows requires actual computation, not token prediction\n", + "\n", + "Lambda tools solve this by executing real Python code with libraries like NumPy and Pandas, giving agents access to precise, scalable data analysis capabilities." + ] + }, + { + "cell_type": "markdown", + "id": "cell-3", + "metadata": {}, + "source": [ + "## Getting Started\n", + "\n", + "This notebook assumes you've completed Notebooks 1-6:\n", + "- Notebook 1: Created corpora\n", + "- Notebook 2: Ingested data\n", + "- Notebook 3: Queried data\n", + "- Notebook 4: Created agents and sessions\n", + "- Notebook 5: Built multi-agent workflows with Lambda tools\n", + "- Notebook 6: Worked with file artifacts\n", + "\n", + "Now we'll create sophisticated Lambda tools for data analysis using NumPy and Pandas." + ] + }, + { + "cell_type": "markdown", + "id": "cell-4", + "metadata": {}, + "source": [ + "## Setup" + ] + }, + { + "cell_type": "code", + "execution_count": 1, + "id": "cell-5", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Setup complete!\n" + ] + } + ], + "source": [ + "import os\n", + "import json\n", + "from datetime import datetime\n", + "\n", + "import requests\n", + "\n", + "# Get credentials from environment variables\n", + "api_key = os.environ['VECTARA_API_KEY']\n", + "\n", + "# Base API URL\n", + "BASE_URL = \"https://api.vectara.io/v2\"\n", + "\n", + "# Common headers\n", + "headers = {\n", + " \"x-api-key\": api_key,\n", + " \"Content-Type\": \"application/json\"\n", + "}\n", + "\n", + "print(\"Setup complete!\")" + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "id": "cell-6", + "metadata": {}, + "outputs": [], + "source": [ + "# Helper function to manage Lambda tools\n", + "def delete_and_create_tool(tool_config, tool_name):\n", + " \"\"\"Delete tool if it exists, then create a new one.\"\"\"\n", + " list_response = requests.get(f\"{BASE_URL}/tools\", headers=headers)\n", + " \n", + " if list_response.status_code == 200:\n", + " tools = list_response.json().get('tools', [])\n", + " for tool in tools:\n", + " if tool.get('name') == tool_name:\n", + " existing_id = tool['id']\n", + " print(f\"Deleting existing tool '{tool_name}' ({existing_id})\")\n", + " delete_response = requests.delete(f\"{BASE_URL}/tools/{existing_id}\", headers=headers)\n", + " if delete_response.status_code == 204:\n", + " print(f\"Deleted tool: {existing_id}\")\n", + " break\n", + " \n", + " response = requests.post(f\"{BASE_URL}/tools\", headers=headers, json=tool_config)\n", + " \n", + " if response.status_code == 201:\n", + " tool_data = response.json()\n", + " print(f\"Created tool '{tool_name}'\")\n", + " print(f\"Tool ID: {tool_data['id']}\")\n", + " return tool_data['id']\n", + " else:\n", + " raise RuntimeError(f\"Failed to create tool '{tool_name}': {response.status_code} - {response.text}\")\n", + "\n", + "\n", + "# Helper function to manage agents\n", + "def delete_and_create_agent(agent_config, agent_name):\n", + " \"\"\"Delete agent if it exists, then create a new one.\"\"\"\n", + " list_response = requests.get(f\"{BASE_URL}/agents\", headers=headers)\n", + "\n", + " if list_response.status_code == 200:\n", + " agents = list_response.json().get('agents', [])\n", + " for agent in agents:\n", + " if agent.get('name') == agent_name:\n", + " existing_key = agent['key']\n", + " print(f\"Deleting existing agent '{agent_name}' ({existing_key})\")\n", + " delete_response = requests.delete(f\"{BASE_URL}/agents/{existing_key}\", headers=headers)\n", + " if delete_response.status_code == 204:\n", + " print(f\"Deleted agent: {existing_key}\")\n", + " break\n", + "\n", + " response = requests.post(f\"{BASE_URL}/agents\", headers=headers, json=agent_config)\n", + "\n", + " if response.status_code == 201:\n", + " agent_data = response.json()\n", + " print(f\"Created agent '{agent_name}'\")\n", + " print(f\"Agent Key: {agent_data['key']}\")\n", + " return agent_data['key']\n", + " else:\n", + " raise RuntimeError(f\"Failed to create agent '{agent_name}': {response.status_code} - {response.text}\")" + ] + }, + { + "cell_type": "markdown", + "id": "cell-7", + "metadata": {}, + "source": [ + "## Step 1: Create a Statistical Analyzer Lambda Tool\n", + "\n", + "This tool uses Pandas to compute descriptive statistics on tabular data. It can calculate:\n", + "- Basic stats: mean, median, std, min, max, percentiles\n", + "- Correlations between numeric columns\n", + "- Value counts for categorical columns\n", + "\n", + "The tool accepts JSON data (as a string) and returns computed statistics." + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "id": "cell-8", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Created tool 'statistical_analyzer'\n", + "Tool ID: tol_3455\n" + ] + } + ], + "source": [ + "statistical_analyzer_code = '''\n", + "import json\n", + "import pandas as pd\n", + "import numpy as np\n", + "\n", + "def process(\n", + " data: str,\n", + " columns: str = \"\",\n", + " operations: str = \"describe\"\n", + ") -> dict:\n", + " \"\"\"\n", + " Compute statistical analysis on tabular data using Pandas.\n", + " \n", + " Args:\n", + " data: JSON string containing the data (list of dicts or dict of lists)\n", + " columns: Comma-separated column names to analyze (empty = all numeric columns)\n", + " operations: Comma-separated operations: describe, correlation, value_counts, percentiles\n", + " \n", + " Returns:\n", + " Dictionary with computed statistics\n", + " \"\"\"\n", + " results = {\"success\": True, \"statistics\": {}}\n", + " \n", + " # Parse the input data\n", + " try:\n", + " parsed_data = json.loads(data)\n", + " df = pd.DataFrame(parsed_data)\n", + " except Exception as e:\n", + " return {\"success\": False, \"error\": f\"Failed to parse data: {str(e)}\"}\n", + " \n", + " # Filter columns if specified\n", + " if columns:\n", + " col_list = [c.strip() for c in columns.split(\",\")]\n", + " valid_cols = [c for c in col_list if c in df.columns]\n", + " if not valid_cols:\n", + " return {\"success\": False, \"error\": f\"None of the specified columns found. Available: {list(df.columns)}\"}\n", + " df_analysis = df[valid_cols]\n", + " else:\n", + " df_analysis = df.select_dtypes(include=[np.number])\n", + " \n", + " # Parse operations\n", + " ops = [op.strip().lower() for op in operations.split(\",\")]\n", + " \n", + " # Execute requested operations\n", + " if \"describe\" in ops:\n", + " desc = df_analysis.describe()\n", + " results[\"statistics\"][\"describe\"] = desc.to_dict()\n", + " \n", + " if \"correlation\" in ops:\n", + " numeric_df = df_analysis.select_dtypes(include=[np.number])\n", + " if len(numeric_df.columns) >= 2:\n", + " corr = numeric_df.corr()\n", + " results[\"statistics\"][\"correlation\"] = corr.to_dict()\n", + " else:\n", + " results[\"statistics\"][\"correlation\"] = \"Need at least 2 numeric columns\"\n", + " \n", + " if \"value_counts\" in ops:\n", + " value_counts = {}\n", + " for col in df_analysis.columns:\n", + " vc = df_analysis[col].value_counts().head(10)\n", + " value_counts[col] = vc.to_dict()\n", + " results[\"statistics\"][\"value_counts\"] = value_counts\n", + " \n", + " if \"percentiles\" in ops:\n", + " percentiles = {}\n", + " numeric_df = df_analysis.select_dtypes(include=[np.number])\n", + " for col in numeric_df.columns:\n", + " percentiles[col] = {\n", + " \"p10\": float(numeric_df[col].quantile(0.10)),\n", + " \"p25\": float(numeric_df[col].quantile(0.25)),\n", + " \"p50\": float(numeric_df[col].quantile(0.50)),\n", + " \"p75\": float(numeric_df[col].quantile(0.75)),\n", + " \"p90\": float(numeric_df[col].quantile(0.90)),\n", + " \"p99\": float(numeric_df[col].quantile(0.99))\n", + " }\n", + " results[\"statistics\"][\"percentiles\"] = percentiles\n", + " \n", + " # Add metadata\n", + " results[\"metadata\"] = {\n", + " \"rows\": len(df),\n", + " \"columns_analyzed\": list(df_analysis.columns),\n", + " \"operations_performed\": ops\n", + " }\n", + " \n", + " return results\n", + "'''\n", + "\n", + "statistical_analyzer_config = {\n", + " \"type\": \"lambda\",\n", + " \"language\": \"python\",\n", + " \"name\": \"statistical_analyzer\",\n", + " \"title\": \"Statistical Analyzer\",\n", + " \"description\": \"\"\"Compute statistical analysis on tabular data using Pandas and NumPy.\n", + " \n", + "Pass data as a JSON string (list of dicts). Specify columns to analyze (comma-separated) or leave empty for all numeric columns.\n", + "\n", + "Available operations (comma-separated):\n", + "- describe: Basic stats (mean, std, min, max, quartiles)\n", + "- correlation: Correlation matrix between numeric columns \n", + "- value_counts: Top 10 most frequent values per column\n", + "- percentiles: p10, p25, p50, p75, p90, p99 for numeric columns\n", + "\n", + "Example: operations=\"describe,correlation\" will return both descriptive stats and correlations.\"\"\",\n", + " \"code\": statistical_analyzer_code\n", + "}\n", + "\n", + "statistical_analyzer_id = delete_and_create_tool(statistical_analyzer_config, \"statistical_analyzer\")" + ] + }, + { + "cell_type": "markdown", + "id": "cell-9", + "metadata": {}, + "source": [ + "## Step 2: Create a Trend Analyzer Lambda Tool\n", + "\n", + "This tool uses NumPy and Pandas for time-series analysis:\n", + "- Moving averages (3 and 7 period windows)\n", + "- Growth rates (period-over-period and total)\n", + "- Trend detection (linear regression slope and direction)" + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "id": "cell-10", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Created tool 'trend_analyzer'\n", + "Tool ID: tol_3456\n" + ] + } + ], + "source": [ + "trend_analyzer_code = '''\n", + "import json\n", + "import pandas as pd\n", + "import numpy as np\n", + "\n", + "def process(\n", + " data: str,\n", + " date_column: str,\n", + " value_column: str,\n", + " analysis_type: str = \"all\"\n", + ") -> dict:\n", + " \"\"\"\n", + " Analyze trends in time-series data using NumPy and Pandas.\n", + "\n", + " Args:\n", + " data: JSON string containing the time-series data\n", + " date_column: Name of the date/time column (must be in ISO format, e.g. YYYY-MM-DD)\n", + " value_column: Name of the numeric value column to analyze\n", + " analysis_type: Comma-separated: moving_average, growth_rate, trend, all\n", + "\n", + " Returns:\n", + " Dictionary with trend analysis results\n", + " \"\"\"\n", + " results = {\"success\": True, \"analysis\": {}}\n", + "\n", + " try:\n", + " parsed_data = json.loads(data)\n", + " df = pd.DataFrame(parsed_data)\n", + " except Exception as e:\n", + " return {\"success\": False, \"error\": f\"Failed to parse data: {str(e)}\"}\n", + "\n", + " if date_column not in df.columns:\n", + " return {\"success\": False, \"error\": f\"Date column '{date_column}' not found. Available: {list(df.columns)}\"}\n", + " if value_column not in df.columns:\n", + " return {\"success\": False, \"error\": f\"Value column '{value_column}' not found. Available: {list(df.columns)}\"}\n", + "\n", + " # Sort by date column as string (works for ISO-format dates).\n", + " df = df.sort_values(date_column).reset_index(drop=True)\n", + " \n", + " values = df[value_column].astype(float)\n", + " analyses = [a.strip().lower() for a in analysis_type.split(\",\")]\n", + " do_all = \"all\" in analyses\n", + "\n", + " # Moving Averages\n", + " if do_all or \"moving_average\" in analyses:\n", + " ma_results = {}\n", + " for window in [3, 7]:\n", + " if len(values) >= window:\n", + " ma = values.rolling(window=window).mean()\n", + " ma_results[f\"ma_{window}\"] = {\n", + " \"latest\": float(ma.iloc[-1]) if not pd.isna(ma.iloc[-1]) else None,\n", + " \"values\": [float(v) if not pd.isna(v) else None for v in ma.tail(10)]\n", + " }\n", + " results[\"analysis\"][\"moving_averages\"] = ma_results\n", + "\n", + " # Growth Rates\n", + " if do_all or \"growth_rate\" in analyses:\n", + " pct_change = values.pct_change()\n", + " growth = {\n", + " \"period_over_period\": {\n", + " \"latest\": float(pct_change.iloc[-1]) if not pd.isna(pct_change.iloc[-1]) else None,\n", + " \"mean\": float(pct_change.mean()) if not pd.isna(pct_change.mean()) else None,\n", + " }\n", + " }\n", + " if values.iloc[0] != 0:\n", + " growth[\"total_growth\"] = float((values.iloc[-1] - values.iloc[0]) / values.iloc[0])\n", + " results[\"analysis\"][\"growth_rates\"] = growth\n", + "\n", + " # Trend Detection (Linear Regression)\n", + " if do_all or \"trend\" in analyses:\n", + " x = np.arange(len(values))\n", + " y = values.values\n", + " mask = ~np.isnan(y)\n", + " if mask.sum() > 1:\n", + " x_clean, y_clean = x[mask], y[mask]\n", + " slope, intercept = np.polyfit(x_clean, y_clean, 1)\n", + " y_pred = slope * x_clean + intercept\n", + " ss_res = np.sum((y_clean - y_pred) ** 2)\n", + " ss_tot = np.sum((y_clean - np.mean(y_clean)) ** 2)\n", + " r_squared = 1 - (ss_res / ss_tot) if ss_tot != 0 else 0\n", + "\n", + " mean_y = np.mean(y_clean)\n", + " if slope > 0.01 * mean_y:\n", + " direction = \"upward\"\n", + " elif slope < -0.01 * mean_y:\n", + " direction = \"downward\"\n", + " else:\n", + " direction = \"flat\"\n", + "\n", + " results[\"analysis\"][\"trend\"] = {\n", + " \"slope\": float(slope),\n", + " \"intercept\": float(intercept),\n", + " \"r_squared\": float(r_squared),\n", + " \"direction\": direction,\n", + " \"slope_per_period_pct\": float(slope / mean_y * 100) if mean_y != 0 else 0\n", + " }\n", + "\n", + " results[\"summary\"] = {\n", + " \"start_date\": str(df[date_column].iloc[0]),\n", + " \"end_date\": str(df[date_column].iloc[-1]),\n", + " \"n_periods\": len(values),\n", + " \"start_value\": float(values.iloc[0]),\n", + " \"end_value\": float(values.iloc[-1]),\n", + " \"min\": float(values.min()),\n", + " \"max\": float(values.max()),\n", + " \"mean\": float(values.mean())\n", + " }\n", + "\n", + " return results\n", + "'''\n", + "\n", + "trend_analyzer_config = {\n", + " \"type\": \"lambda\",\n", + " \"language\": \"python\",\n", + " \"name\": \"trend_analyzer\",\n", + " \"title\": \"Trend Analyzer\",\n", + " \"description\": \"\"\"Analyze trends in time-series data using NumPy and Pandas.\n", + "\n", + "Pass time-series data as JSON, specifying the date column and value column to analyze.\n", + "Date column must be in ISO format (YYYY-MM-DD) for correct sorting.\n", + "\n", + "Available analysis types (comma-separated):\n", + "- moving_average: Simple moving averages (3 and 7 period windows)\n", + "- growth_rate: Period-over-period and total growth\n", + "- trend: Linear regression to detect trend direction, slope, and R-squared\n", + "- all: Perform all analyses (default)\n", + "\n", + "Returns trend direction (upward/downward/flat), growth metrics, and moving averages.\"\"\",\n", + " \"code\": trend_analyzer_code\n", + "}\n", + "\n", + "trend_analyzer_id = delete_and_create_tool(trend_analyzer_config, \"trend_analyzer\")" + ] + }, + { + "cell_type": "markdown", + "id": "cell-13", + "metadata": {}, + "source": [ + "## Step 3: Create a Data Analyst Agent\n", + "\n", + "Now we'll create an agent equipped with both data analysis Lambda tools. This agent can:\n", + "- Analyze datasets with statistical methods\n", + "- Identify trends in time-series data" + ] + }, + { + "cell_type": "code", + "execution_count": 5, + "id": "cell-14", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Created agent 'Data Analyst'\n", + "Agent Key: agt_data_analyst_39dd\n" + ] + } + ], + "source": [ + "# Delete previous agent/session if re-running this cell\n", + "if 'session_key' in dir() and session_key and 'data_analyst_key' in dir() and data_analyst_key:\n", + " requests.delete(f\"{BASE_URL}/agents/{data_analyst_key}/sessions/{session_key}\", headers=headers)\n", + " print(f\"Deleted previous session: {session_key}\")\n", + " session_key = None\n", + "\n", + "if 'data_analyst_key' in dir() and data_analyst_key:\n", + " resp = requests.delete(f\"{BASE_URL}/agents/{data_analyst_key}\", headers=headers)\n", + " if resp.status_code == 204:\n", + " print(f\"Deleted previous agent: {data_analyst_key}\")\n", + " data_analyst_key = None\n", + "\n", + "data_analyst_config = {\n", + " \"name\": \"Data Analyst\",\n", + " \"description\": \"Agent specialized in data analysis using NumPy and Pandas-powered Lambda tools\",\n", + " \"model\": {\"name\": \"gpt-4o\"},\n", + " \"first_step\": {\n", + " \"type\": \"conversational\",\n", + " \"instructions\": [\n", + " {\n", + " \"type\": \"inline\",\n", + " \"name\": \"data_analyst_instructions\",\n", + " \"template\": \"\"\"You are an expert data analyst with access to powerful data analysis tools.\n", + "\n", + "Your capabilities:\n", + "1. **Statistical Analysis**: Use the statistical_analyzer tool to compute descriptive statistics, correlations, and percentiles\n", + "2. **Trend Analysis**: Use the trend_analyzer tool to identify patterns, growth rates, and trends in time-series data\n", + "\n", + "When analyzing data:\n", + "1. First understand the data structure and what the user wants to learn\n", + "2. Choose the appropriate tool(s) for the analysis\n", + "3. Pass data as a JSON string to the tools\n", + "4. Interpret the results and explain insights in plain language\n", + "5. Suggest follow-up analyses if relevant\n", + "\n", + "IMPORTANT:\n", + "- Always use tools for numerical computations - don't try to calculate statistics manually\n", + "- Explain what each metric means in the context of the user's data\n", + "- Highlight actionable insights and anomalies\"\"\"\n", + " }\n", + " ],\n", + " \"output_parser\": {\"type\": \"default\"}\n", + " },\n", + " \"tool_configurations\": {\n", + " \"statistical_analyzer\": {\n", + " \"type\": \"lambda\",\n", + " \"tool_id\": statistical_analyzer_id\n", + " },\n", + " \"trend_analyzer\": {\n", + " \"type\": \"lambda\",\n", + " \"tool_id\": trend_analyzer_id\n", + " }\n", + " }\n", + "}\n", + "\n", + "data_analyst_key = delete_and_create_agent(data_analyst_config, \"Data Analyst\")" + ] + }, + { + "cell_type": "markdown", + "id": "cell-15", + "metadata": {}, + "source": [ + "## Step 5: Create a Session and Test the Agent\n", + "\n", + "Let's create a session and test the data analyst agent with sample datasets." + ] + }, + { + "cell_type": "code", + "execution_count": 6, + "id": "cell-16", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Session Created: ase_data_analysis_demo_20260130-054132_66b7\n" + ] + } + ], + "source": [ + "# Delete previous session if re-running this cell\n", + "if 'session_key' in dir() and session_key:\n", + " requests.delete(f\"{BASE_URL}/agents/{data_analyst_key}/sessions/{session_key}\", headers=headers)\n", + " print(f\"Deleted previous session: {session_key}\")\n", + " session_key = None\n", + "\n", + "# Create a session\n", + "session_name = f\"Data Analysis Demo {datetime.now().strftime('%Y%m%d-%H%M%S')}\"\n", + "session_config = {\n", + " \"name\": session_name,\n", + " \"metadata\": {\"purpose\": \"lambda_tools_demo\"}\n", + "}\n", + "\n", + "response = requests.post(\n", + " f\"{BASE_URL}/agents/{data_analyst_key}/sessions\",\n", + " headers=headers,\n", + " json=session_config\n", + ")\n", + "\n", + "if response.status_code == 201:\n", + " session_data = response.json()\n", + " session_key = session_data[\"key\"]\n", + " print(f\"Session Created: {session_key}\")\n", + "else:\n", + " raise RuntimeError(f\"Failed to create session: {response.status_code} - {response.text}\")" + ] + }, + { + "cell_type": "code", + "execution_count": 7, + "id": "cell-17", + "metadata": {}, + "outputs": [], + "source": [ + "# Helper function to chat with the agent\n", + "def chat_with_agent(agent_key, session_key, message, show_events=False):\n", + " \"\"\"Send a message to an agent and return the response.\"\"\"\n", + " message_data = {\n", + " \"messages\": [{\"type\": \"text\", \"content\": message}],\n", + " \"stream_response\": False\n", + " }\n", + " \n", + " url = f\"{BASE_URL}/agents/{agent_key}/sessions/{session_key}/events\"\n", + " response = requests.post(url, headers=headers, json=message_data)\n", + " \n", + " if response.status_code == 201:\n", + " event_data = response.json()\n", + " \n", + " if show_events:\n", + " print(\"\\n------ Agent Events ------\")\n", + " for event in event_data.get('events', []):\n", + " event_type = event.get('type', 'unknown')\n", + " if event_type == 'tool_input':\n", + " tool_name = event.get('tool_configuration_name', 'N/A')\n", + " print(f\"Tool Called: {tool_name}\")\n", + " tool_input = event.get('tool_input', {})\n", + " # Show key parameters (not the full data)\n", + " params = {k: v[:100] + '...' if isinstance(v, str) and len(v) > 100 else v \n", + " for k, v in tool_input.items() if k != 'data'}\n", + " if params:\n", + " print(f\" Params: {params}\")\n", + " elif event_type == 'tool_output':\n", + " tool_name = event.get('tool_configuration_name', 'N/A')\n", + " tool_output = event.get('tool_output', {})\n", + " output_str = json.dumps(tool_output)\n", + " if len(output_str) > 500:\n", + " output_str = output_str[:500] + '...'\n", + " print(f\" Output from {tool_name}: {output_str}\")\n", + " print(\"-\" * 25 + \"\\n\")\n", + " \n", + " # Extract agent output\n", + " for event in event_data.get('events', []):\n", + " if event.get('type') == 'agent_output':\n", + " return event.get('content', 'No content')\n", + " return \"No agent output found\"\n", + " else:\n", + " return f\"Error: {response.status_code} - {response.text}\"" + ] + }, + { + "cell_type": "markdown", + "id": "cell-18", + "metadata": {}, + "source": [ + "### Example 1: Statistical Analysis\n", + "\n", + "Let's analyze a sales dataset with the statistical analyzer." + ] + }, + { + "cell_type": "code", + "execution_count": 8, + "id": "cell-19", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "User: Analyzing sales data...\n", + "================================================================================\n", + "\n", + "------ Agent Events ------\n", + "Tool Called: statistical_analyzer\n", + " Params: {'columns': 'sales,units,profit_margin', 'operations': 'describe'}\n", + "Tool Called: statistical_analyzer\n", + " Params: {'columns': 'sales,units,profit_margin', 'operations': 'correlation'}\n", + " Output from statistical_analyzer: {\"success\": true, \"statistics\": {\"correlation\": {\"sales\": {\"sales\": 1.0, \"units\": -0.39664343302135424, \"profit_margin\": 0.9896428771138392}, \"units\": {\"sales\": -0.39664343302135424, \"units\": 1.0, \"profit_margin\": -0.48295207604868146}, \"profit_margin\": {\"sales\": 0.9896428771138392, \"units\": -0.48295207604868146, \"profit_margin\": 1.0}}}, \"metadata\": {\"rows\": 12, \"columns_analyzed\": [\"sales\", \"units\", \"profit_margin\"], \"operations_performed\": [\"correlation\"]}}\n", + " Output from statistical_analyzer: {\"success\": true, \"statistics\": {\"describe\": {\"sales\": {\"count\": 12.0, \"mean\": 14916.666666666666, \"std\": 7412.622322563703, \"min\": 6000.0, \"25%\": 8750.0, \"50%\": 13500.0, \"75%\": 19750.0, \"max\": 28000.0}, \"units\": {\"count\": 12.0, \"mean\": 148.83333333333334, \"std\": 45.14790508244356, \"min\": 90.0, \"25%\": 117.5, \"50%\": 145.0, \"75%\": 182.0, \"max\": 238.0}, \"profit_margin\": {\"count\": 12.0, \"mean\": 0.24916666666666668, \"std\": 0.09652680582317144, \"min\": 0.12, \"25%\": 0.1725, \"50%\": 0.235, \"75%\": 0.3275, ...\n", + "-------------------------\n", + "\n", + "Agent Response:\n", + "\n", + "### Basic Statistics for Sales and Profit Margins\n", + "\n", + "Here's a summary of the descriptive statistics for sales, units, and profit margins across all products and regions:\n", + "\n", + "1. **Sales**:\n", + " - **Average (Mean)**: $14,916.67\n", + " - **Standard Deviation**: $7,412.62\n", + " - **Minimum**: $6,000\n", + " - **25th Percentile**: $8,750\n", + " - **Median (50th Percentile)**: $13,500\n", + " - **75th Percentile**: $19,750\n", + " - **Maximum**: $28,000\n", + "\n", + "2. **Units Sold**:\n", + " - **Average (Mean)**: 148.83 units\n", + " - **Standard Deviation**: 45.15 units\n", + " - **Minimum**: 90 units\n", + " - **25th Percentile**: 117.5 units\n", + " - **Median (50th Percentile)**: 145 units\n", + " - **75th Percentile**: 182 units\n", + " - **Maximum**: 238 units\n", + "\n", + "3. **Profit Margin**:\n", + " - **Average (Mean)**: 0.25 (25%)\n", + " - **Standard Deviation**: 0.097\n", + " - **Minimum**: 0.12 (12%)\n", + " - **25th Percentile**: 0.1725 (17.25%)\n", + " - **Median (50th Percentile)**: 0.235 (23.5%)\n", + " - **75th Percentile**: 0.3275 (32.75%)\n", + " - **Maximum**: 0.4 (40%)\n", + "\n", + "### Correlation Between Sales, Units, and Profit Margins\n", + "\n", + "Here are the correlation coefficients, which indicate how these metrics relate to each other:\n", + "\n", + "1. **Sales and Units**: \n", + " - Correlation: -0.40\n", + " - Interpretation: A slight negative correlation, indicating that as sales increase, units might slightly decrease or vice versa, which could suggest pricing strategies or different sales strategies.\n", + "\n", + "2. **Sales and Profit Margin**: \n", + " - Correlation: 0.99\n", + " - Interpretation: A very strong positive correlation, meaning that higher sales tend to be associated with higher profit margins. This may suggest effective cost management and profit maximization on higher sales volumes.\n", + "\n", + "3. **Units and Profit Margin**: \n", + " - Correlation: -0.48\n", + " - Interpretation: A moderate negative correlation, indicating that higher unit sales are somewhat associated with lower profit margins, possibly suggesting economies of scale or volume discounting.\n", + "\n", + "### Insights\n", + "\n", + "- **Strong Sales and Profit Margin Correlation**: The strong relationship suggests that boosting sales can significantly enhance profit margins, perhaps due to better utilization of resources or more effective sales strategies at larger volumes.\n", + " \n", + "- **Negative Correlation between Units and Profit Margin**: This could suggest that higher selling volumes might be achieved with discounts or lower pricing, which affects profit margins negatively.\n", + "\n", + "### Suggested Follow-ups\n", + "- Analyze each product individually to understand if the trends are consistent across all products or if specific products drive these correlations.\n", + "- Look into regional differences to see if certain regions show stronger or weaker correlations, which could inform targeted strategies.\n" + ] + } + ], + "source": [ + "# Sample sales data\n", + "sales_data = [\n", + " {\"product\": \"Widget A\", \"region\": \"North\", \"sales\": 15000, \"units\": 150, \"profit_margin\": 0.25},\n", + " {\"product\": \"Widget A\", \"region\": \"South\", \"sales\": 12000, \"units\": 120, \"profit_margin\": 0.22},\n", + " {\"product\": \"Widget A\", \"region\": \"East\", \"sales\": 18000, \"units\": 180, \"profit_margin\": 0.28},\n", + " {\"product\": \"Widget A\", \"region\": \"West\", \"sales\": 9000, \"units\": 90, \"profit_margin\": 0.20},\n", + " {\"product\": \"Widget B\", \"region\": \"North\", \"sales\": 22000, \"units\": 110, \"profit_margin\": 0.35},\n", + " {\"product\": \"Widget B\", \"region\": \"South\", \"sales\": 25000, \"units\": 125, \"profit_margin\": 0.38},\n", + " {\"product\": \"Widget B\", \"region\": \"East\", \"sales\": 19000, \"units\": 95, \"profit_margin\": 0.32},\n", + " {\"product\": \"Widget B\", \"region\": \"West\", \"sales\": 28000, \"units\": 140, \"profit_margin\": 0.40},\n", + " {\"product\": \"Widget C\", \"region\": \"North\", \"sales\": 8000, \"units\": 200, \"profit_margin\": 0.15},\n", + " {\"product\": \"Widget C\", \"region\": \"South\", \"sales\": 7500, \"units\": 188, \"profit_margin\": 0.14},\n", + " {\"product\": \"Widget C\", \"region\": \"East\", \"sales\": 9500, \"units\": 238, \"profit_margin\": 0.18},\n", + " {\"product\": \"Widget C\", \"region\": \"West\", \"sales\": 6000, \"units\": 150, \"profit_margin\": 0.12},\n", + "]\n", + "\n", + "query = f\"\"\"I have sales data for three products across four regions. \n", + "Please analyze this data and tell me:\n", + "1. Basic statistics for sales and profit margins\n", + "2. Correlation between sales, units, and profit margin\n", + "\n", + "Here's the data:\n", + "{json.dumps(sales_data)}\"\"\"\n", + "\n", + "print(\"User: Analyzing sales data...\")\n", + "print(\"=\" * 80)\n", + "\n", + "response = chat_with_agent(data_analyst_key, session_key, query, show_events=True)\n", + "print(f\"Agent Response:\\n\\n{response}\")" + ] + }, + { + "cell_type": "markdown", + "id": "cell-20", + "metadata": {}, + "source": [ + "### Example 2: Trend Analysis\n", + "\n", + "Let's analyze monthly revenue trends." + ] + }, + { + "cell_type": "code", + "execution_count": 9, + "id": "cell-21", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "User: Analyzing revenue trends...\n", + "================================================================================\n", + "\n", + "------ Agent Events ------\n", + "Tool Called: trend_analyzer\n", + " Params: {'date_column': 'month', 'value_column': 'revenue', 'analysis_type': 'all'}\n", + " Output from trend_analyzer: {\"error\": \"All retry attempts failed\"}\n", + "Tool Called: trend_analyzer\n", + " Params: {'date_column': 'month', 'value_column': 'revenue', 'analysis_type': 'all'}\n", + " Output from trend_analyzer: {\"error\": \"All retry attempts failed\"}\n", + "Tool Called: trend_analyzer\n", + " Params: {'date_column': 'month', 'value_column': 'revenue', 'analysis_type': 'all'}\n", + " Output from trend_analyzer: {\"error\": \"All retry attempts failed\"}\n", + "-------------------------\n", + "\n", + "Agent Response:\n", + "\n", + "It seems there was an error while trying to analyze the revenue trend using the tool. Let's try analyzing it again. Please hold on for a moment.\n" + ] + } + ], + "source": [ + "# Sample time-series data\n", + "revenue_data = [\n", + " {\"month\": \"2024-01-01\", \"revenue\": 100000},\n", + " {\"month\": \"2024-02-01\", \"revenue\": 105000},\n", + " {\"month\": \"2024-03-01\", \"revenue\": 98000},\n", + " {\"month\": \"2024-04-01\", \"revenue\": 112000},\n", + " {\"month\": \"2024-05-01\", \"revenue\": 118000},\n", + " {\"month\": \"2024-06-01\", \"revenue\": 125000},\n", + " {\"month\": \"2024-07-01\", \"revenue\": 122000},\n", + " {\"month\": \"2024-08-01\", \"revenue\": 135000},\n", + " {\"month\": \"2024-09-01\", \"revenue\": 142000},\n", + " {\"month\": \"2024-10-01\", \"revenue\": 138000},\n", + " {\"month\": \"2024-11-01\", \"revenue\": 155000},\n", + " {\"month\": \"2024-12-01\", \"revenue\": 168000},\n", + "]\n", + "\n", + "query = f\"\"\"Analyze the revenue trend for this year. I want to know:\n", + "1. Is revenue trending up or down?\n", + "2. What's the growth rate?\n", + "3. What are the moving averages?\n", + "\n", + "Here's the monthly data:\n", + "{json.dumps(revenue_data)}\"\"\"\n", + "\n", + "print(\"User: Analyzing revenue trends...\")\n", + "print(\"=\" * 80)\n", + "\n", + "response = chat_with_agent(data_analyst_key, session_key, query, show_events=True)\n", + "print(f\"Agent Response:\\n\\n{response}\")" + ] + }, + { + "cell_type": "markdown", + "id": "cell-24", + "metadata": {}, + "source": [ + "### Example 3: Multi-Tool Analysis\n", + "\n", + "Let's ask for a comprehensive analysis that requires both tools." + ] + }, + { + "cell_type": "code", + "execution_count": 10, + "id": "cell-25", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "User: Comprehensive business analysis...\n", + "================================================================================\n", + "\n", + "------ Agent Events ------\n", + "Tool Called: statistical_analyzer\n", + " Params: {'columns': 'revenue,costs,customers', 'operations': 'describe'}\n", + "Tool Called: statistical_analyzer\n", + " Params: {'columns': 'revenue,customers', 'operations': 'correlation'}\n", + "Tool Called: trend_analyzer\n", + " Params: {'date_column': 'date', 'value_column': 'revenue', 'analysis_type': 'all'}\n", + " Output from statistical_analyzer: {\"success\": true, \"statistics\": {\"describe\": {\"revenue\": {\"count\": 8.0, \"mean\": 601250.0, \"std\": 119933.01701962046, \"min\": 450000.0, \"25%\": 510000.0, \"50%\": 595000.0, \"75%\": 667500.0, \"max\": 800000.0}, \"costs\": {\"count\": 8.0, \"mean\": 394375.0, \"std\": 55255.09026325086, \"min\": 320000.0, \"25%\": 353750.0, \"50%\": 395000.0, \"75%\": 427500.0, \"max\": 480000.0}, \"customers\": {\"count\": 8.0, \"mean\": 1818.75, \"std\": 487.66023007827897, \"min\": 1200.0, \"25%\": 1462.5, \"50%\": 1775.0, \"75%\": 2075.0, \"max\": 2650...\n", + " Output from statistical_analyzer: {\"success\": true, \"statistics\": {\"correlation\": {\"revenue\": {\"revenue\": 1.0, \"customers\": 0.9985504882556115}, \"customers\": {\"revenue\": 0.9985504882556115, \"customers\": 1.0}}}, \"metadata\": {\"rows\": 8, \"columns_analyzed\": [\"revenue\", \"customers\"], \"operations_performed\": [\"correlation\"]}}\n", + " Output from trend_analyzer: {\"error\": \"All retry attempts failed\"}\n", + "-------------------------\n", + "\n", + "Agent Response:\n", + "\n", + "### Statistical Summary of Revenue, Costs, and Customers\n", + "\n", + "Here's a summary of the descriptive statistics for revenue, costs, and customers from the quarterly data:\n", + "\n", + "1. **Revenue**:\n", + " - **Average (Mean)**: \\$601,250\n", + " - **Standard Deviation**: \\$119,933\n", + " - **Minimum**: \\$450,000\n", + " - **25th Percentile**: \\$510,000\n", + " - **Median (50th Percentile)**: \\$595,000\n", + " - **75th Percentile**: \\$667,500\n", + " - **Maximum**: \\$800,000\n", + "\n", + "2. **Costs**:\n", + " - **Average (Mean)**: \\$394,375\n", + " - **Standard Deviation**: \\$55,255\n", + " - **Minimum**: \\$320,000\n", + " - **25th Percentile**: \\$353,750\n", + " - **Median (50th Percentile)**: \\$395,000\n", + " - **75th Percentile**: \\$427,500\n", + " - **Maximum**: \\$480,000\n", + "\n", + "3. **Customers**:\n", + " - **Average (Mean)**: 1,818.75 customers\n", + " - **Standard Deviation**: 487.66 customers\n", + " - **Minimum**: 1,200 customers\n", + " - **25th Percentile**: 1,462.5 customers\n", + " - **Median (50th Percentile)**: 1,775 customers\n", + " - **75th Percentile**: 2,075 customers\n", + " - **Maximum**: 2,650 customers\n", + "\n", + "### Correlation Analysis\n", + "\n", + "The correlation between revenue and the number of customers is extremely strong:\n", + "\n", + "- **Correlation**: \\(0.9986\\)\n", + "\n", + "This indicates that revenue is almost directly proportional to the number of customers. More customers generally lead to a proportionate increase in revenue.\n", + "\n", + "### Trend Analysis on Revenue\n", + "\n", + "Unfortunately, there was an error while generating the trend analysis, so let's discuss the overall trajectory based on available data.\n", + "\n", + "**Observations**:\n", + "- From 2023 to 2024, there is a noticeable rise in revenue each quarter.\n", + "- The jump from \\$450,000 in 2023-Q1 to \\$800,000 in 2024-Q4 indicates robust growth.\n", + "\n", + "### Suggested Next Steps\n", + "\n", + "- **Visual Inspection**: Plotting the revenue over quarters would help visualize the trend and could confirm the upward trajectory.\n", + "- **Further Analysis**: Once the trend analysis tool is available, it can provide specific growth rates and moving averages for more precise interpretations.\n", + "- **Operational Analysis**: Investigate causes behind revenue growth — whether it's due to an increase in customer base, market expansion, pricing strategy changes, or operational efficiencies.\n" + ] + } + ], + "source": [ + "# Comprehensive dataset\n", + "quarterly_data = [\n", + " {\"quarter\": \"2023-Q1\", \"date\": \"2023-01-01\", \"revenue\": 450000, \"costs\": 320000, \"customers\": 1200},\n", + " {\"quarter\": \"2023-Q2\", \"date\": \"2023-04-01\", \"revenue\": 480000, \"costs\": 335000, \"customers\": 1350},\n", + " {\"quarter\": \"2023-Q3\", \"date\": \"2023-07-01\", \"revenue\": 520000, \"costs\": 360000, \"customers\": 1500},\n", + " {\"quarter\": \"2023-Q4\", \"date\": \"2023-10-01\", \"revenue\": 610000, \"costs\": 400000, \"customers\": 1800},\n", + " {\"quarter\": \"2024-Q1\", \"date\": \"2024-01-01\", \"revenue\": 580000, \"costs\": 390000, \"customers\": 1750},\n", + " {\"quarter\": \"2024-Q2\", \"date\": \"2024-04-01\", \"revenue\": 650000, \"costs\": 420000, \"customers\": 2000},\n", + " {\"quarter\": \"2024-Q3\", \"date\": \"2024-07-01\", \"revenue\": 720000, \"costs\": 450000, \"customers\": 2300},\n", + " {\"quarter\": \"2024-Q4\", \"date\": \"2024-10-01\", \"revenue\": 800000, \"costs\": 480000, \"customers\": 2650},\n", + "]\n", + "\n", + "query = f\"\"\"I need a comprehensive analysis of our quarterly business performance.\n", + "\n", + "Please provide:\n", + "1. Statistical summary of revenue, costs, and customers\n", + "2. Correlation analysis - are revenue and customers correlated?\n", + "3. Trend analysis on revenue - what's our growth trajectory?\n", + "\n", + "Here's the data:\n", + "{json.dumps(quarterly_data)}\"\"\"\n", + "\n", + "print(\"User: Comprehensive business analysis...\")\n", + "print(\"=\" * 80)\n", + "\n", + "response = chat_with_agent(data_analyst_key, session_key, query, show_events=True)\n", + "print(f\"Agent Response:\\n\\n{response}\")" + ] + }, + { + "cell_type": "markdown", + "id": "cell-27", + "metadata": {}, + "source": [ + "## Cleanup (Optional)\n", + "\n", + "Delete the resources created in this notebook." + ] + }, + { + "cell_type": "code", + "execution_count": 11, + "id": "cell-28", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Deleted agent: agt_data_analyst_39dd\n", + "Deleted tool: statistical_analyzer (tol_3455)\n", + "Deleted tool: trend_analyzer (tol_3456)\n" + ] + } + ], + "source": [ + "# Delete the agent\n", + "if data_analyst_key:\n", + " response = requests.delete(f\"{BASE_URL}/agents/{data_analyst_key}\", headers=headers)\n", + " if response.status_code == 204:\n", + " print(f\"Deleted agent: {data_analyst_key}\")\n", + " else:\n", + " print(f\"Error deleting agent: {response.text}\")\n", + "\n", + "# Delete the Lambda tools\n", + "for tool_id, tool_name in [\n", + " (statistical_analyzer_id, \"statistical_analyzer\"),\n", + " (trend_analyzer_id, \"trend_analyzer\"),\n", + "]:\n", + " if tool_id:\n", + " response = requests.delete(f\"{BASE_URL}/tools/{tool_id}\", headers=headers)\n", + " if response.status_code == 204:\n", + " print(f\"Deleted tool: {tool_name} ({tool_id})\")\n", + " else:\n", + " print(f\"Error deleting {tool_name}: {response.text}\")" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3 (ipykernel)", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.12.2" + } + }, + "nbformat": 4, + "nbformat_minor": 5 +} diff --git a/notebooks/api-examples/README.md b/notebooks/api-examples/README.md index b86c99b..c1cd674 100644 --- a/notebooks/api-examples/README.md +++ b/notebooks/api-examples/README.md @@ -1,6 +1,6 @@ # Vectara API Tutorial Series -This tutorial series provides a comprehensive, hands-on introduction to building RAG (Retrieval-Augmented Generation) applications using Vectara's REST API. Through five progressive notebooks, you'll learn to create corpora, ingest data, query information, build intelligent AI agents, and orchestrate multi-agent workflows. +This tutorial series provides a comprehensive, hands-on introduction to building RAG (Retrieval-Augmented Generation) applications using Vectara's REST API. Through seven progressive notebooks, you'll learn to create corpora, ingest data, query information, build intelligent AI agents, orchestrate multi-agent workflows, work with file artifacts, and create data analysis tools with NumPy and Pandas. ## About Vectara @@ -258,6 +258,77 @@ orchestrator_config = { --- +### [Notebook 6: Artifacts](6-artifacts.ipynb) + +**What you'll learn:** +- Upload files (PDFs, images, documents) to agent sessions +- List and retrieve artifact details +- Create agents with artifact-processing tools +- Have agents analyze uploaded files and generate new artifacts + +**What you'll build:** +A **Document Analyst** agent that can: +- Read and analyze uploaded documents and images +- Search for patterns within artifact content +- Convert documents between formats +- Generate new artifacts (reports, summaries) + +**Key concepts:** +- **Artifacts**: Session-specific files that enable agents to work with files on-the-fly +- **Artifact tools**: `artifact_read`, `image_read`, `document_conversion`, `artifact_grep` +- **Two-way flow**: Users upload files, agents can generate new artifacts +- **Session scope**: Artifacts persist within a session without permanent indexing + +--- + +### [Notebook 7: Lambda Tools for Data Analysis](7-lambda-tools-data-analysis.ipynb) + +**What you'll learn:** +- Create Lambda tools that use NumPy and Pandas for data analysis +- Pass structured data (JSON) to tools and receive computed results +- Build tools for statistical analysis, trend detection, and data transformation +- Combine multiple data analysis tools in agent workflows + +**What you'll build:** +Three **Data Analysis Lambda Tools**: +1. **Statistical Analyzer**: Descriptive statistics, correlations, percentiles using Pandas +2. **Trend Analyzer**: Moving averages, growth rates, linear regression using NumPy +3. **Data Transformer**: Normalization, missing value handling, outlier removal, aggregation + +**Lambda tool configuration:** +```python +tool_config = { + "type": "lambda", + "language": "python", + "name": "statistical_analyzer", + "title": "Statistical Analyzer", + "description": "Compute statistics on tabular data using Pandas...", + "code": """ +import pandas as pd +import numpy as np + +def process(data: str, columns: str = "", operations: str = "describe") -> dict: + df = pd.DataFrame(json.loads(data)) + # ... compute statistics + return {"success": True, "statistics": {...}} +""" +} +``` + +**Key concepts:** +- **Lambda tools execute real Python code** with NumPy and Pandas available +- **Data passed as JSON strings** between agents and tools +- **Precise numerical computation** that LLMs cannot reliably perform +- **Multi-tool workflows** for comprehensive data analysis + +**Use cases:** +- Financial analysis and reporting +- Sales and marketing analytics +- Scientific data processing +- Time-series analysis + +--- + ## Tutorial Flow ``` @@ -280,6 +351,14 @@ orchestrator_config = { 5. Sub-Agents ↓ Create multi-agent workflows with specialized sub-agents + +6. Artifacts + ↓ + Work with files in agent sessions + +7. Lambda Tools for Data Analysis + ↓ + Build NumPy/Pandas-powered data analysis tools ``` ## Running the Notebooks @@ -322,12 +401,16 @@ jupyter notebook | `POST /v2/corpora/{key}/documents` | Index documents | 2 | | `GET /v2/corpora/{key}/documents` | List documents | 2 | | `POST /v2/query` | Query corpora | 3 | -| `POST /v2/agents` | Create agent | 4, 5 | -| `POST /v2/agents/{key}/sessions` | Create session | 4, 5 | -| `POST /v2/agents/{key}/sessions/{key}/events` | Send messages | 4, 5 | +| `POST /v2/agents` | Create agent | 4, 5, 6, 7 | +| `POST /v2/agents/{key}/sessions` | Create session | 4, 5, 6, 7 | +| `POST /v2/agents/{key}/sessions/{key}/events` | Send messages / Upload artifacts | 4, 5, 6, 7 | | `GET /v2/agents/{key}/sessions/{key}/events` | Get conversation history | 4 | +| `GET /v2/agents/{key}/sessions/{key}/artifacts` | List session artifacts | 6 | | `GET /v2/agents` | List agents | 5 | -| `DELETE /v2/agents/{key}` | Delete agent | 5 | +| `DELETE /v2/agents/{key}` | Delete agent | 5, 6, 7 | +| `POST /v2/tools` | Create Lambda tool | 5, 7 | +| `GET /v2/tools` | List Lambda tools | 5, 7 | +| `DELETE /v2/tools/{id}` | Delete Lambda tool | 5, 7 | ## Additional Resources From 195b91ed29ee3201e94c4f45ce097f4460111c1c Mon Sep 17 00:00:00 2001 From: Ofer Mendelevitch Date: Mon, 23 Feb 2026 08:58:33 -0800 Subject: [PATCH 2/6] updated lambda tools example --- .../7-lambda-tools-data-analysis.ipynb | 205 +++++++++--------- 1 file changed, 98 insertions(+), 107 deletions(-) diff --git a/notebooks/api-examples/7-lambda-tools-data-analysis.ipynb b/notebooks/api-examples/7-lambda-tools-data-analysis.ipynb index 2092361..a7b9813 100644 --- a/notebooks/api-examples/7-lambda-tools-data-analysis.ipynb +++ b/notebooks/api-examples/7-lambda-tools-data-analysis.ipynb @@ -191,7 +191,7 @@ "output_type": "stream", "text": [ "Created tool 'statistical_analyzer'\n", - "Tool ID: tol_3455\n" + "Tool ID: tol_3832\n" ] } ], @@ -329,7 +329,7 @@ "output_type": "stream", "text": [ "Created tool 'trend_analyzer'\n", - "Tool ID: tol_3456\n" + "Tool ID: tol_3833\n" ] } ], @@ -491,7 +491,7 @@ "output_type": "stream", "text": [ "Created agent 'Data Analyst'\n", - "Agent Key: agt_data_analyst_39dd\n" + "Agent Key: agt_data_analyst_9c24\n" ] } ], @@ -574,7 +574,7 @@ "name": "stdout", "output_type": "stream", "text": [ - "Session Created: ase_data_analysis_demo_20260130-054132_66b7\n" + "Session Created: ase_data_analysis_demo_20260205-135955_e035\n" ] } ], @@ -683,71 +683,53 @@ "\n", "------ Agent Events ------\n", "Tool Called: statistical_analyzer\n", - " Params: {'columns': 'sales,units,profit_margin', 'operations': 'describe'}\n", + " Params: {'columns': 'sales,profit_margin', 'operations': 'describe'}\n", "Tool Called: statistical_analyzer\n", " Params: {'columns': 'sales,units,profit_margin', 'operations': 'correlation'}\n", + " Output from statistical_analyzer: {\"success\": true, \"statistics\": {\"describe\": {\"sales\": {\"count\": 12.0, \"mean\": 14916.666666666666, \"std\": 7412.622322563703, \"min\": 6000.0, \"25%\": 8750.0, \"50%\": 13500.0, \"75%\": 19750.0, \"max\": 28000.0}, \"profit_margin\": {\"count\": 12.0, \"mean\": 0.24916666666666668, \"std\": 0.09652680582317144, \"min\": 0.12, \"25%\": 0.1725, \"50%\": 0.235, \"75%\": 0.3275, \"max\": 0.4}}}, \"metadata\": {\"rows\": 12, \"columns_analyzed\": [\"sales\", \"profit_margin\"], \"operations_performed\": [\"describe\"]}}\n", " Output from statistical_analyzer: {\"success\": true, \"statistics\": {\"correlation\": {\"sales\": {\"sales\": 1.0, \"units\": -0.39664343302135424, \"profit_margin\": 0.9896428771138392}, \"units\": {\"sales\": -0.39664343302135424, \"units\": 1.0, \"profit_margin\": -0.48295207604868146}, \"profit_margin\": {\"sales\": 0.9896428771138392, \"units\": -0.48295207604868146, \"profit_margin\": 1.0}}}, \"metadata\": {\"rows\": 12, \"columns_analyzed\": [\"sales\", \"units\", \"profit_margin\"], \"operations_performed\": [\"correlation\"]}}\n", - " Output from statistical_analyzer: {\"success\": true, \"statistics\": {\"describe\": {\"sales\": {\"count\": 12.0, \"mean\": 14916.666666666666, \"std\": 7412.622322563703, \"min\": 6000.0, \"25%\": 8750.0, \"50%\": 13500.0, \"75%\": 19750.0, \"max\": 28000.0}, \"units\": {\"count\": 12.0, \"mean\": 148.83333333333334, \"std\": 45.14790508244356, \"min\": 90.0, \"25%\": 117.5, \"50%\": 145.0, \"75%\": 182.0, \"max\": 238.0}, \"profit_margin\": {\"count\": 12.0, \"mean\": 0.24916666666666668, \"std\": 0.09652680582317144, \"min\": 0.12, \"25%\": 0.1725, \"50%\": 0.235, \"75%\": 0.3275, ...\n", "-------------------------\n", "\n", "Agent Response:\n", "\n", "### Basic Statistics for Sales and Profit Margins\n", "\n", - "Here's a summary of the descriptive statistics for sales, units, and profit margins across all products and regions:\n", + "#### Sales\n", + "- **Count**: 12 entries\n", + "- **Mean**: $14,917\n", + "- **Standard Deviation**: $7,413\n", + "- **Minimum**: $6,000\n", + "- **25th Percentile**: $8,750\n", + "- **50th Percentile (Median)**: $13,500\n", + "- **75th Percentile**: $19,750\n", + "- **Maximum**: $28,000\n", "\n", - "1. **Sales**:\n", - " - **Average (Mean)**: $14,916.67\n", - " - **Standard Deviation**: $7,412.62\n", - " - **Minimum**: $6,000\n", - " - **25th Percentile**: $8,750\n", - " - **Median (50th Percentile)**: $13,500\n", - " - **75th Percentile**: $19,750\n", - " - **Maximum**: $28,000\n", + "#### Profit Margin\n", + "- **Count**: 12 entries\n", + "- **Mean**: 0.249 (or 24.9%)\n", + "- **Standard Deviation**: 0.097 (or 9.7%)\n", + "- **Minimum**: 0.12 (or 12%)\n", + "- **25th Percentile**: 0.1725 (or 17.25%)\n", + "- **50th Percentile (Median)**: 0.235 (or 23.5%)\n", + "- **75th Percentile**: 0.3275 (or 32.75%)\n", + "- **Maximum**: 0.4 (or 40%)\n", "\n", - "2. **Units Sold**:\n", - " - **Average (Mean)**: 148.83 units\n", - " - **Standard Deviation**: 45.15 units\n", - " - **Minimum**: 90 units\n", - " - **25th Percentile**: 117.5 units\n", - " - **Median (50th Percentile)**: 145 units\n", - " - **75th Percentile**: 182 units\n", - " - **Maximum**: 238 units\n", + "### Correlation between Sales, Units, and Profit Margin\n", "\n", - "3. **Profit Margin**:\n", - " - **Average (Mean)**: 0.25 (25%)\n", - " - **Standard Deviation**: 0.097\n", - " - **Minimum**: 0.12 (12%)\n", - " - **25th Percentile**: 0.1725 (17.25%)\n", - " - **Median (50th Percentile)**: 0.235 (23.5%)\n", - " - **75th Percentile**: 0.3275 (32.75%)\n", - " - **Maximum**: 0.4 (40%)\n", + "- **Sales and Profit Margin**: 0.9896\n", + " - **Interpretation**: There is a very strong positive correlation between sales and profit margin, indicating that as sales increase, profit margin tends to increase significantly.\n", "\n", - "### Correlation Between Sales, Units, and Profit Margins\n", + "- **Sales and Units**: -0.3966\n", + " - **Interpretation**: There is a moderate negative correlation between sales and the number of units sold, suggesting that as sales rise, the number of units tends to decrease, or vice versa.\n", "\n", - "Here are the correlation coefficients, which indicate how these metrics relate to each other:\n", - "\n", - "1. **Sales and Units**: \n", - " - Correlation: -0.40\n", - " - Interpretation: A slight negative correlation, indicating that as sales increase, units might slightly decrease or vice versa, which could suggest pricing strategies or different sales strategies.\n", - "\n", - "2. **Sales and Profit Margin**: \n", - " - Correlation: 0.99\n", - " - Interpretation: A very strong positive correlation, meaning that higher sales tend to be associated with higher profit margins. This may suggest effective cost management and profit maximization on higher sales volumes.\n", - "\n", - "3. **Units and Profit Margin**: \n", - " - Correlation: -0.48\n", - " - Interpretation: A moderate negative correlation, indicating that higher unit sales are somewhat associated with lower profit margins, possibly suggesting economies of scale or volume discounting.\n", + "- **Units and Profit Margin**: -0.4830\n", + " - **Interpretation**: There is a moderate negative correlation between the number of units sold and profit margin, indicating that higher numbers of units are associated with lower profit margins.\n", "\n", "### Insights\n", + "- The significant positive correlation between sales and profit margins suggests that higher sales are typically associated with higher profit margins. This could imply efficient cost management or pricing strategies for more expensive products.\n", + "- The negative correlations involving units suggest that volume of sales in terms of units might be inversely related to either the revenue per unit or cost management efficiency.\n", "\n", - "- **Strong Sales and Profit Margin Correlation**: The strong relationship suggests that boosting sales can significantly enhance profit margins, perhaps due to better utilization of resources or more effective sales strategies at larger volumes.\n", - " \n", - "- **Negative Correlation between Units and Profit Margin**: This could suggest that higher selling volumes might be achieved with discounts or lower pricing, which affects profit margins negatively.\n", - "\n", - "### Suggested Follow-ups\n", - "- Analyze each product individually to understand if the trends are consistent across all products or if specific products drive these correlations.\n", - "- Look into regional differences to see if certain regions show stronger or weaker correlations, which could inform targeted strategies.\n" + "These insights could guide strategies such as emphasizing high-margin products or evaluating pricing strategies for different product categories to improve profit margins. Further analysis could segment these insights by product or region for more targeted strategies.\n" ] } ], @@ -810,17 +792,34 @@ "Tool Called: trend_analyzer\n", " Params: {'date_column': 'month', 'value_column': 'revenue', 'analysis_type': 'all'}\n", " Output from trend_analyzer: {\"error\": \"All retry attempts failed\"}\n", - "Tool Called: trend_analyzer\n", - " Params: {'date_column': 'month', 'value_column': 'revenue', 'analysis_type': 'all'}\n", - " Output from trend_analyzer: {\"error\": \"All retry attempts failed\"}\n", - "Tool Called: trend_analyzer\n", - " Params: {'date_column': 'month', 'value_column': 'revenue', 'analysis_type': 'all'}\n", - " Output from trend_analyzer: {\"error\": \"All retry attempts failed\"}\n", "-------------------------\n", "\n", "Agent Response:\n", "\n", - "It seems there was an error while trying to analyze the revenue trend using the tool. Let's try analyzing it again. Please hold on for a moment.\n" + "Unfortunately, it seems there was an issue accessing the trend analysis tool to evaluate the revenue data. Let's break down what each analysis would involve and what insights could be expected.\n", + "\n", + "### 1. Revenue Trend\n", + "To determine the trend direction (upward, downward, or flat), a linear regression on the revenue over time would be performed. This involves fitting a line through the revenue points and evaluating the slope:\n", + "- **Upward Trend**: Positive slope.\n", + "- **Downward Trend**: Negative slope.\n", + "- **Flat Trend**: Slope near zero.\n", + "\n", + "### 2. Growth Rate\n", + "- **Period-over-Period Growth**: The percentage change from month to month.\n", + "- **Total Growth**: The percentage change from the start of the period to the end.\n", + "\n", + "The growth rate would indicate how rapidly the revenue is increasing or decreasing month by month and over the year as a whole.\n", + "\n", + "### 3. Moving Averages\n", + "- **3-Month Moving Average**: This smooths fluctuations by taking the average of every three consecutive months.\n", + "- **7-Month Moving Average**: A longer-term moving average used for identifying more persistent trends.\n", + "\n", + "### Insights You'd Expect\n", + "- **Is Revenue Trending Up or Down?**: Based on the slope of the trend line, you'd confirm whether revenues are generally increasing or decreasing.\n", + "- **Growth Rates**: A numerical representation of how revenue is changing month-to-month and from the start to end of the year.\n", + "- **Moving Averages**: These offer a smoothed version of the revenue trend to filter out short-term volatility and focus on longer-term patterns. \n", + "\n", + "If you can try running the tool again or provide a different dataset structure or details, we can work to resolve this analysis. Alternatively, I can provide guidance on interpreting a trend once detail is available. Would you like to retry, or do you need help with something else?\n" ] } ], @@ -886,65 +885,57 @@ " Params: {'columns': 'revenue,customers', 'operations': 'correlation'}\n", "Tool Called: trend_analyzer\n", " Params: {'date_column': 'date', 'value_column': 'revenue', 'analysis_type': 'all'}\n", - " Output from statistical_analyzer: {\"success\": true, \"statistics\": {\"describe\": {\"revenue\": {\"count\": 8.0, \"mean\": 601250.0, \"std\": 119933.01701962046, \"min\": 450000.0, \"25%\": 510000.0, \"50%\": 595000.0, \"75%\": 667500.0, \"max\": 800000.0}, \"costs\": {\"count\": 8.0, \"mean\": 394375.0, \"std\": 55255.09026325086, \"min\": 320000.0, \"25%\": 353750.0, \"50%\": 395000.0, \"75%\": 427500.0, \"max\": 480000.0}, \"customers\": {\"count\": 8.0, \"mean\": 1818.75, \"std\": 487.66023007827897, \"min\": 1200.0, \"25%\": 1462.5, \"50%\": 1775.0, \"75%\": 2075.0, \"max\": 2650...\n", " Output from statistical_analyzer: {\"success\": true, \"statistics\": {\"correlation\": {\"revenue\": {\"revenue\": 1.0, \"customers\": 0.9985504882556115}, \"customers\": {\"revenue\": 0.9985504882556115, \"customers\": 1.0}}}, \"metadata\": {\"rows\": 8, \"columns_analyzed\": [\"revenue\", \"customers\"], \"operations_performed\": [\"correlation\"]}}\n", + " Output from statistical_analyzer: {\"success\": true, \"statistics\": {\"describe\": {\"revenue\": {\"count\": 8.0, \"mean\": 601250.0, \"std\": 119933.01701962046, \"min\": 450000.0, \"25%\": 510000.0, \"50%\": 595000.0, \"75%\": 667500.0, \"max\": 800000.0}, \"costs\": {\"count\": 8.0, \"mean\": 394375.0, \"std\": 55255.09026325086, \"min\": 320000.0, \"25%\": 353750.0, \"50%\": 395000.0, \"75%\": 427500.0, \"max\": 480000.0}, \"customers\": {\"count\": 8.0, \"mean\": 1818.75, \"std\": 487.66023007827897, \"min\": 1200.0, \"25%\": 1462.5, \"50%\": 1775.0, \"75%\": 2075.0, \"max\": 2650...\n", " Output from trend_analyzer: {\"error\": \"All retry attempts failed\"}\n", "-------------------------\n", "\n", "Agent Response:\n", "\n", - "### Statistical Summary of Revenue, Costs, and Customers\n", - "\n", - "Here's a summary of the descriptive statistics for revenue, costs, and customers from the quarterly data:\n", + "### Statistical Summary\n", "\n", - "1. **Revenue**:\n", - " - **Average (Mean)**: \\$601,250\n", - " - **Standard Deviation**: \\$119,933\n", - " - **Minimum**: \\$450,000\n", - " - **25th Percentile**: \\$510,000\n", - " - **Median (50th Percentile)**: \\$595,000\n", - " - **75th Percentile**: \\$667,500\n", - " - **Maximum**: \\$800,000\n", + "#### Revenue\n", + "- **Count**: 8 entries\n", + "- **Mean**: $601,250\n", + "- **Standard Deviation**: $119,933\n", + "- **Minimum**: $450,000\n", + "- **25th Percentile**: $510,000\n", + "- **50th Percentile (Median)**: $595,000\n", + "- **75th Percentile**: $667,500\n", + "- **Maximum**: $800,000\n", "\n", - "2. **Costs**:\n", - " - **Average (Mean)**: \\$394,375\n", - " - **Standard Deviation**: \\$55,255\n", - " - **Minimum**: \\$320,000\n", - " - **25th Percentile**: \\$353,750\n", - " - **Median (50th Percentile)**: \\$395,000\n", - " - **75th Percentile**: \\$427,500\n", - " - **Maximum**: \\$480,000\n", + "#### Costs\n", + "- **Count**: 8 entries\n", + "- **Mean**: $394,375\n", + "- **Standard Deviation**: $55,255\n", + "- **Minimum**: $320,000\n", + "- **25th Percentile**: $353,750\n", + "- **50th Percentile (Median)**: $395,000\n", + "- **75th Percentile**: $427,500\n", + "- **Maximum**: $480,000\n", "\n", - "3. **Customers**:\n", - " - **Average (Mean)**: 1,818.75 customers\n", - " - **Standard Deviation**: 487.66 customers\n", - " - **Minimum**: 1,200 customers\n", - " - **25th Percentile**: 1,462.5 customers\n", - " - **Median (50th Percentile)**: 1,775 customers\n", - " - **75th Percentile**: 2,075 customers\n", - " - **Maximum**: 2,650 customers\n", + "#### Customers\n", + "- **Count**: 8 entries\n", + "- **Mean**: 1,819 customers\n", + "- **Standard Deviation**: 488 customers\n", + "- **Minimum**: 1,200 customers\n", + "- **25th Percentile**: 1,463 customers\n", + "- **50th Percentile (Median)**: 1,775 customers\n", + "- **75th Percentile**: 2,075 customers\n", + "- **Maximum**: 2,650 customers\n", "\n", "### Correlation Analysis\n", - "\n", - "The correlation between revenue and the number of customers is extremely strong:\n", - "\n", - "- **Correlation**: \\(0.9986\\)\n", - "\n", - "This indicates that revenue is almost directly proportional to the number of customers. More customers generally lead to a proportionate increase in revenue.\n", + "- **Correlation between Revenue and Customers**: 0.9986\n", + " - **Interpretation**: There is an extremely strong positive correlation between revenue and the number of customers. This suggests that as the number of customers increases, the revenue increases almost proportionally.\n", "\n", "### Trend Analysis on Revenue\n", + "Unfortunately, the trend analysis tool encountered an issue and could not be executed. However, we can interpret that the revenue is likely on an upward trajectory given the correlation with increasing customer numbers and the positive growth indicated by the quarterly revenue data. To determine the growth trajectory accurately, further analysis would typically involve:\n", + "- **Trend Line Slope**: Estimating the slope through linear regression as a measure of growth.\n", + "- **Growth Rate**: Calculating both period-over-period (quarterly) growth and overall yearly growth.\n", + "- **Moving Averages**: Using 2-quarter and 4-quarter moving averages to smooth out short-term bumps and focus on long-term trends.\n", "\n", - "Unfortunately, there was an error while generating the trend analysis, so let's discuss the overall trajectory based on available data.\n", - "\n", - "**Observations**:\n", - "- From 2023 to 2024, there is a noticeable rise in revenue each quarter.\n", - "- The jump from \\$450,000 in 2023-Q1 to \\$800,000 in 2024-Q4 indicates robust growth.\n", - "\n", - "### Suggested Next Steps\n", - "\n", - "- **Visual Inspection**: Plotting the revenue over quarters would help visualize the trend and could confirm the upward trajectory.\n", - "- **Further Analysis**: Once the trend analysis tool is available, it can provide specific growth rates and moving averages for more precise interpretations.\n", - "- **Operational Analysis**: Investigate causes behind revenue growth — whether it's due to an increase in customer base, market expansion, pricing strategy changes, or operational efficiencies.\n" + "### Conclusion\n", + "The statistics and correlation analysis suggest robust growth in both customers and revenue, underlining the importance of customer acquisition and retention strategies. Despite the trend analysis failure, the data indicates positive business performance, with increasing revenue and customer base over the quarters analyzed.\n" ] } ], @@ -998,9 +989,9 @@ "name": "stdout", "output_type": "stream", "text": [ - "Deleted agent: agt_data_analyst_39dd\n", - "Deleted tool: statistical_analyzer (tol_3455)\n", - "Deleted tool: trend_analyzer (tol_3456)\n" + "Deleted agent: agt_data_analyst_9c24\n", + "Deleted tool: statistical_analyzer (tol_3832)\n", + "Deleted tool: trend_analyzer (tol_3833)\n" ] } ], From 7fd449734f0f8a747581ed1e99460b9a875efff0 Mon Sep 17 00:00:00 2001 From: Ofer Mendelevitch Date: Mon, 2 Mar 2026 09:44:38 -0800 Subject: [PATCH 3/6] updated --- .../7-lambda-tools-data-analysis.ipynb | 179 ++++++++---------- 1 file changed, 83 insertions(+), 96 deletions(-) diff --git a/notebooks/api-examples/7-lambda-tools-data-analysis.ipynb b/notebooks/api-examples/7-lambda-tools-data-analysis.ipynb index a7b9813..8210282 100644 --- a/notebooks/api-examples/7-lambda-tools-data-analysis.ipynb +++ b/notebooks/api-examples/7-lambda-tools-data-analysis.ipynb @@ -191,7 +191,7 @@ "output_type": "stream", "text": [ "Created tool 'statistical_analyzer'\n", - "Tool ID: tol_3832\n" + "Tool ID: tol_5332\n" ] } ], @@ -329,7 +329,7 @@ "output_type": "stream", "text": [ "Created tool 'trend_analyzer'\n", - "Tool ID: tol_3833\n" + "Tool ID: tol_5333\n" ] } ], @@ -491,7 +491,7 @@ "output_type": "stream", "text": [ "Created agent 'Data Analyst'\n", - "Agent Key: agt_data_analyst_9c24\n" + "Agent Key: agt_data_analyst_70a4\n" ] } ], @@ -574,7 +574,7 @@ "name": "stdout", "output_type": "stream", "text": [ - "Session Created: ase_data_analysis_demo_20260205-135955_e035\n" + "Session Created: ase_data_analysis_demo_20260223-090005_2633\n" ] } ], @@ -686,50 +686,45 @@ " Params: {'columns': 'sales,profit_margin', 'operations': 'describe'}\n", "Tool Called: statistical_analyzer\n", " Params: {'columns': 'sales,units,profit_margin', 'operations': 'correlation'}\n", - " Output from statistical_analyzer: {\"success\": true, \"statistics\": {\"describe\": {\"sales\": {\"count\": 12.0, \"mean\": 14916.666666666666, \"std\": 7412.622322563703, \"min\": 6000.0, \"25%\": 8750.0, \"50%\": 13500.0, \"75%\": 19750.0, \"max\": 28000.0}, \"profit_margin\": {\"count\": 12.0, \"mean\": 0.24916666666666668, \"std\": 0.09652680582317144, \"min\": 0.12, \"25%\": 0.1725, \"50%\": 0.235, \"75%\": 0.3275, \"max\": 0.4}}}, \"metadata\": {\"rows\": 12, \"columns_analyzed\": [\"sales\", \"profit_margin\"], \"operations_performed\": [\"describe\"]}}\n", " Output from statistical_analyzer: {\"success\": true, \"statistics\": {\"correlation\": {\"sales\": {\"sales\": 1.0, \"units\": -0.39664343302135424, \"profit_margin\": 0.9896428771138392}, \"units\": {\"sales\": -0.39664343302135424, \"units\": 1.0, \"profit_margin\": -0.48295207604868146}, \"profit_margin\": {\"sales\": 0.9896428771138392, \"units\": -0.48295207604868146, \"profit_margin\": 1.0}}}, \"metadata\": {\"rows\": 12, \"columns_analyzed\": [\"sales\", \"units\", \"profit_margin\"], \"operations_performed\": [\"correlation\"]}}\n", + " Output from statistical_analyzer: {\"success\": true, \"statistics\": {\"describe\": {\"sales\": {\"count\": 12.0, \"mean\": 14916.666666666666, \"std\": 7412.622322563703, \"min\": 6000.0, \"25%\": 8750.0, \"50%\": 13500.0, \"75%\": 19750.0, \"max\": 28000.0}, \"profit_margin\": {\"count\": 12.0, \"mean\": 0.24916666666666668, \"std\": 0.09652680582317144, \"min\": 0.12, \"25%\": 0.1725, \"50%\": 0.235, \"75%\": 0.3275, \"max\": 0.4}}}, \"metadata\": {\"rows\": 12, \"columns_analyzed\": [\"sales\", \"profit_margin\"], \"operations_performed\": [\"describe\"]}}\n", "-------------------------\n", "\n", "Agent Response:\n", "\n", - "### Basic Statistics for Sales and Profit Margins\n", + "Here's the analysis of your sales data for the three products across four regions:\n", "\n", - "#### Sales\n", - "- **Count**: 12 entries\n", - "- **Mean**: $14,917\n", - "- **Standard Deviation**: $7,413\n", - "- **Minimum**: $6,000\n", - "- **25th Percentile**: $8,750\n", - "- **50th Percentile (Median)**: $13,500\n", - "- **75th Percentile**: $19,750\n", - "- **Maximum**: $28,000\n", + "### Basic Statistics for Sales and Profit Margins\n", + "1. **Sales**:\n", + " - **Count**: 12 data points\n", + " - **Mean**: $14,917\n", + " - **Standard Deviation**: $7,413\n", + " - **Minimum**: $6,000\n", + " - **25th Percentile**: $8,750\n", + " - **Median (50th Percentile)**: $13,500\n", + " - **75th Percentile**: $19,750\n", + " - **Maximum**: $28,000\n", "\n", - "#### Profit Margin\n", - "- **Count**: 12 entries\n", - "- **Mean**: 0.249 (or 24.9%)\n", - "- **Standard Deviation**: 0.097 (or 9.7%)\n", - "- **Minimum**: 0.12 (or 12%)\n", - "- **25th Percentile**: 0.1725 (or 17.25%)\n", - "- **50th Percentile (Median)**: 0.235 (or 23.5%)\n", - "- **75th Percentile**: 0.3275 (or 32.75%)\n", - "- **Maximum**: 0.4 (or 40%)\n", + "2. **Profit Margin**:\n", + " - **Count**: 12 data points\n", + " - **Mean**: 0.249 (24.9%)\n", + " - **Standard Deviation**: 0.097 (9.7%)\n", + " - **Minimum**: 0.12 (12%)\n", + " - **25th Percentile**: 0.173 (17.3%)\n", + " - **Median (50th Percentile)**: 0.235 (23.5%)\n", + " - **75th Percentile**: 0.328 (32.8%)\n", + " - **Maximum**: 0.4 (40%)\n", "\n", "### Correlation between Sales, Units, and Profit Margin\n", + "- **Sales and Units**: -0.40 (Negative correlation indicating that as sales increase, the number of units slightly decreases, or vice versa)\n", + "- **Sales and Profit Margin**: 0.99 (Strong positive correlation suggesting that higher sales are associated with higher profit margins)\n", + "- **Units and Profit Margin**: -0.48 (Moderate negative correlation indicating that as the number of units increases, profit margins tend to decrease)\n", "\n", - "- **Sales and Profit Margin**: 0.9896\n", - " - **Interpretation**: There is a very strong positive correlation between sales and profit margin, indicating that as sales increase, profit margin tends to increase significantly.\n", - "\n", - "- **Sales and Units**: -0.3966\n", - " - **Interpretation**: There is a moderate negative correlation between sales and the number of units sold, suggesting that as sales rise, the number of units tends to decrease, or vice versa.\n", - "\n", - "- **Units and Profit Margin**: -0.4830\n", - " - **Interpretation**: There is a moderate negative correlation between the number of units sold and profit margin, indicating that higher numbers of units are associated with lower profit margins.\n", + "### Insights:\n", + "- **High Sales, High Profit Margin**: Products with higher sales tend to have higher profit margins. This could indicate effective pricing strategies or higher perceived value by consumers.\n", + "- **Sales and Units**: Interestingly, there's a slight inverse relationship between sales volumes and the number of units sold. This may suggest that premium pricing or larger bulk sales affect the figures.\n", "\n", - "### Insights\n", - "- The significant positive correlation between sales and profit margins suggests that higher sales are typically associated with higher profit margins. This could imply efficient cost management or pricing strategies for more expensive products.\n", - "- The negative correlations involving units suggest that volume of sales in terms of units might be inversely related to either the revenue per unit or cost management efficiency.\n", - "\n", - "These insights could guide strategies such as emphasizing high-margin products or evaluating pricing strategies for different product categories to improve profit margins. Further analysis could segment these insights by product or region for more targeted strategies.\n" + "Consider exploring specific regional or product factors that might drive these strong correlations to understand strategic pricing or inventory management better.\n" ] } ], @@ -791,35 +786,26 @@ "------ Agent Events ------\n", "Tool Called: trend_analyzer\n", " Params: {'date_column': 'month', 'value_column': 'revenue', 'analysis_type': 'all'}\n", - " Output from trend_analyzer: {\"error\": \"All retry attempts failed\"}\n", + " Output from trend_analyzer: {\"error\": \"Python execution failed: Sandbox Python execution timed out\"}\n", "-------------------------\n", "\n", "Agent Response:\n", "\n", - "Unfortunately, it seems there was an issue accessing the trend analysis tool to evaluate the revenue data. Let's break down what each analysis would involve and what insights could be expected.\n", + "It seems there was an issue processing the trend analysis request due to a timeout in the execution environment. However, I can guide you on how to perform this analysis or suggest solutions if there's specific assistance you need.\n", + "\n", + "Here's generally how to approach the analysis:\n", "\n", - "### 1. Revenue Trend\n", - "To determine the trend direction (upward, downward, or flat), a linear regression on the revenue over time would be performed. This involves fitting a line through the revenue points and evaluating the slope:\n", - "- **Upward Trend**: Positive slope.\n", - "- **Downward Trend**: Negative slope.\n", - "- **Flat Trend**: Slope near zero.\n", + "1. **Trend Direction**: To find out if revenue is trending up, you can look at a simple linear regression line fit over your data points. If the slope is positive, revenue is trending up.\n", "\n", - "### 2. Growth Rate\n", - "- **Period-over-Period Growth**: The percentage change from month to month.\n", - "- **Total Growth**: The percentage change from the start of the period to the end.\n", + "2. **Growth Rate**: Calculate the percentage increase from one month to the next, as well as the overall increase from the start to end of the period.\n", "\n", - "The growth rate would indicate how rapidly the revenue is increasing or decreasing month by month and over the year as a whole.\n", + "3. **Moving Averages**: Calculate the 3-month and 7-month moving averages to smooth out short-term fluctuations and highlight longer-term trends or cycles.\n", "\n", - "### 3. Moving Averages\n", - "- **3-Month Moving Average**: This smooths fluctuations by taking the average of every three consecutive months.\n", - "- **7-Month Moving Average**: A longer-term moving average used for identifying more persistent trends.\n", + "For accurate computations:\n", "\n", - "### Insights You'd Expect\n", - "- **Is Revenue Trending Up or Down?**: Based on the slope of the trend line, you'd confirm whether revenues are generally increasing or decreasing.\n", - "- **Growth Rates**: A numerical representation of how revenue is changing month-to-month and from the start to end of the year.\n", - "- **Moving Averages**: These offer a smoothed version of the revenue trend to filter out short-term volatility and focus on longer-term patterns. \n", + "- **Using Spreadsheets or Python**: You can use tools like Excel for simpler datasets or Python libraries like Pandas for more complex calculations, especially for moving averages and trend analysis.\n", "\n", - "If you can try running the tool again or provide a different dataset structure or details, we can work to resolve this analysis. Alternatively, I can provide guidance on interpreting a trend once detail is available. Would you like to retry, or do you need help with something else?\n" + "If you want me to attempt the trend analysis again or assist you differently, please let me know!\n" ] } ], @@ -887,55 +873,56 @@ " Params: {'date_column': 'date', 'value_column': 'revenue', 'analysis_type': 'all'}\n", " Output from statistical_analyzer: {\"success\": true, \"statistics\": {\"correlation\": {\"revenue\": {\"revenue\": 1.0, \"customers\": 0.9985504882556115}, \"customers\": {\"revenue\": 0.9985504882556115, \"customers\": 1.0}}}, \"metadata\": {\"rows\": 8, \"columns_analyzed\": [\"revenue\", \"customers\"], \"operations_performed\": [\"correlation\"]}}\n", " Output from statistical_analyzer: {\"success\": true, \"statistics\": {\"describe\": {\"revenue\": {\"count\": 8.0, \"mean\": 601250.0, \"std\": 119933.01701962046, \"min\": 450000.0, \"25%\": 510000.0, \"50%\": 595000.0, \"75%\": 667500.0, \"max\": 800000.0}, \"costs\": {\"count\": 8.0, \"mean\": 394375.0, \"std\": 55255.09026325086, \"min\": 320000.0, \"25%\": 353750.0, \"50%\": 395000.0, \"75%\": 427500.0, \"max\": 480000.0}, \"customers\": {\"count\": 8.0, \"mean\": 1818.75, \"std\": 487.66023007827897, \"min\": 1200.0, \"25%\": 1462.5, \"50%\": 1775.0, \"75%\": 2075.0, \"max\": 2650...\n", - " Output from trend_analyzer: {\"error\": \"All retry attempts failed\"}\n", + " Output from trend_analyzer: {\"error\": \"Python execution failed: Sandbox Python execution timed out\"}\n", "-------------------------\n", "\n", "Agent Response:\n", "\n", - "### Statistical Summary\n", + "Here's the comprehensive analysis of your quarterly business performance:\n", "\n", - "#### Revenue\n", - "- **Count**: 8 entries\n", - "- **Mean**: $601,250\n", - "- **Standard Deviation**: $119,933\n", - "- **Minimum**: $450,000\n", - "- **25th Percentile**: $510,000\n", - "- **50th Percentile (Median)**: $595,000\n", - "- **75th Percentile**: $667,500\n", - "- **Maximum**: $800,000\n", + "### Statistical Summary\n", + "1. **Revenue**:\n", + " - **Count**: 8 quarters\n", + " - **Mean**: $601,250\n", + " - **Standard Deviation**: $119,933\n", + " - **Minimum**: $450,000\n", + " - **25th Percentile**: $510,000\n", + " - **Median (50th Percentile)**: $595,000\n", + " - **75th Percentile**: $667,500\n", + " - **Maximum**: $800,000\n", "\n", - "#### Costs\n", - "- **Count**: 8 entries\n", - "- **Mean**: $394,375\n", - "- **Standard Deviation**: $55,255\n", - "- **Minimum**: $320,000\n", - "- **25th Percentile**: $353,750\n", - "- **50th Percentile (Median)**: $395,000\n", - "- **75th Percentile**: $427,500\n", - "- **Maximum**: $480,000\n", + "2. **Costs**:\n", + " - **Count**: 8 quarters\n", + " - **Mean**: $394,375\n", + " - **Standard Deviation**: $55,255\n", + " - **Minimum**: $320,000\n", + " - **25th Percentile**: $353,750\n", + " - **Median (50th Percentile)**: $395,000\n", + " - **75th Percentile**: $427,500\n", + " - **Maximum**: $480,000\n", "\n", - "#### Customers\n", - "- **Count**: 8 entries\n", - "- **Mean**: 1,819 customers\n", - "- **Standard Deviation**: 488 customers\n", - "- **Minimum**: 1,200 customers\n", - "- **25th Percentile**: 1,463 customers\n", - "- **50th Percentile (Median)**: 1,775 customers\n", - "- **75th Percentile**: 2,075 customers\n", - "- **Maximum**: 2,650 customers\n", + "3. **Customers**:\n", + " - **Count**: 8 quarters\n", + " - **Mean**: 1,818.75 customers\n", + " - **Standard Deviation**: 487.66\n", + " - **Minimum**: 1,200\n", + " - **25th Percentile**: 1,462.5\n", + " - **Median (50th Percentile)**: 1,775\n", + " - **75th Percentile**: 2,075\n", + " - **Maximum**: 2,650\n", "\n", "### Correlation Analysis\n", - "- **Correlation between Revenue and Customers**: 0.9986\n", - " - **Interpretation**: There is an extremely strong positive correlation between revenue and the number of customers. This suggests that as the number of customers increases, the revenue increases almost proportionally.\n", + "- **Revenue and Customers**: 0.999 (very strong positive correlation). This suggests that increases in the number of customers are closely associated with increases in revenue.\n", "\n", "### Trend Analysis on Revenue\n", - "Unfortunately, the trend analysis tool encountered an issue and could not be executed. However, we can interpret that the revenue is likely on an upward trajectory given the correlation with increasing customer numbers and the positive growth indicated by the quarterly revenue data. To determine the growth trajectory accurately, further analysis would typically involve:\n", - "- **Trend Line Slope**: Estimating the slope through linear regression as a measure of growth.\n", - "- **Growth Rate**: Calculating both period-over-period (quarterly) growth and overall yearly growth.\n", - "- **Moving Averages**: Using 2-quarter and 4-quarter moving averages to smooth out short-term bumps and focus on long-term trends.\n", + "- **Trend Direction & Growth Trajectory**: Unfortunately, the trend analysis could not be completed due to an execution timeout. However, based on the data you provided, there's a noticeable increasing trend in revenue from 2023 to 2024, with revenue peaking in 2024-Q4.\n", + "- **Moving Averages & Growth**: As a follow-up, you might consider using spreadsheet tools or Python scripts to calculate quarterly moving averages and growth rates, helping smooth out fluctuations and understand growth rates quarterly.\n", + "\n", + "### Insights:\n", + "- There's a solid upward trend in both revenue and customers, with very high correlation.\n", + "- Costs are also rising, but not as sharply as revenue, which is beneficial for profit margins.\n", "\n", - "### Conclusion\n", - "The statistics and correlation analysis suggest robust growth in both customers and revenue, underlining the importance of customer acquisition and retention strategies. Despite the trend analysis failure, the data indicates positive business performance, with increasing revenue and customer base over the quarters analyzed.\n" + "If you need further assistance or help setting up the trend analysis in another environment, feel free to let me know!\n" ] } ], @@ -989,9 +976,9 @@ "name": "stdout", "output_type": "stream", "text": [ - "Deleted agent: agt_data_analyst_9c24\n", - "Deleted tool: statistical_analyzer (tol_3832)\n", - "Deleted tool: trend_analyzer (tol_3833)\n" + "Deleted agent: agt_data_analyst_70a4\n", + "Deleted tool: statistical_analyzer (tol_5332)\n", + "Deleted tool: trend_analyzer (tol_5333)\n" ] } ], From 31f85b43c355def41c8098ebb4c43fe15ceb32f1 Mon Sep 17 00:00:00 2001 From: Ofer Mendelevitch Date: Tue, 10 Mar 2026 22:09:37 -0700 Subject: [PATCH 4/6] fixed notebook 7 added reranker instructions example (notebook 8) updated messaging --- .../api-examples/1-corpus-creation.ipynb | 17 +- notebooks/api-examples/2-data-ingestion.ipynb | 17 +- notebooks/api-examples/3-query-api.ipynb | 17 +- notebooks/api-examples/4-agent-api.ipynb | 17 +- .../7-lambda-tools-data-analysis.ipynb | 316 +++++++++--------- .../8-reranker-instructions.ipynb | 187 +++++++++++ notebooks/api-examples/README.md | 2 +- 7 files changed, 362 insertions(+), 211 deletions(-) create mode 100644 notebooks/api-examples/8-reranker-instructions.ipynb diff --git a/notebooks/api-examples/1-corpus-creation.ipynb b/notebooks/api-examples/1-corpus-creation.ipynb index 343e9df..b0eb87c 100644 --- a/notebooks/api-examples/1-corpus-creation.ipynb +++ b/notebooks/api-examples/1-corpus-creation.ipynb @@ -26,20 +26,7 @@ "cell_type": "markdown", "id": "db0855d0", "metadata": {}, - "source": [ - "## About Vectara\n", - "\n", - "[Vectara](https://vectara.com/) is the Agent Operating System for trusted enterprise AI: a unified Agentic RAG platform with built-in multi-modal retrieval, orchestration, and always-on governance. Deploy it on-prem (air-gapped), in your VPC, or as SaaS.\n", - "\n", - "Vectara provides a complete API-first platform for building production RAG and agentic applications:\n", - "\n", - "- **Simple Integration**: RESTful APIs and SDKs for Python, TypeScript, and Java make integration straightforward\n", - "- **Flexible Deployment**: Choose SaaS, VPC, or on-premises deployment based on your security and compliance requirements\n", - "- **Multi-Modal Support**: Index and search across text, tables, and images from various document formats\n", - "- **Advanced Retrieval**: Hybrid search combining semantic and keyword matching with multiple reranking options\n", - "- **Grounded Generation**: LLM responses with citations and factual consistency scores to reduce hallucinations\n", - "- **Enterprise-Ready**: Built-in access controls, audit logging, and compliance certifications (SOC2, HIPAA)" - ] + "source": "## About Vectara\n\n[Vectara](https://vectara.com/) is the Agent Platform for trusted enterprise AI: a unified Agentic RAG platform with built-in multi-modal retrieval, orchestration, and always-on governance. Deploy it on-prem (air-gapped), in your VPC, or as SaaS.\n\nVectara provides a complete API-first platform for building production RAG and agentic applications:\n\n- **Simple Integration**: RESTful APIs and SDKs for Python, TypeScript, and Java make integration straightforward\n- **Flexible Deployment**: Choose SaaS, VPC, or on-premises deployment based on your security and compliance requirements\n- **Multi-Modal Support**: Index and search across text, tables, and images from various document formats\n- **Advanced Retrieval**: Hybrid search combining semantic and keyword matching with multiple reranking options\n- **Grounded Generation**: LLM responses with citations and factual consistency scores to reduce hallucinations\n- **Enterprise-Ready**: Built-in access controls, audit logging, and compliance certifications (SOC2, HIPAA)" }, { "cell_type": "markdown", @@ -350,4 +337,4 @@ }, "nbformat": 4, "nbformat_minor": 5 -} +} \ No newline at end of file diff --git a/notebooks/api-examples/2-data-ingestion.ipynb b/notebooks/api-examples/2-data-ingestion.ipynb index a7ac7d0..0f22df1 100644 --- a/notebooks/api-examples/2-data-ingestion.ipynb +++ b/notebooks/api-examples/2-data-ingestion.ipynb @@ -26,20 +26,7 @@ "cell_type": "markdown", "id": "cell-2", "metadata": {}, - "source": [ - "## About Vectara\n", - "\n", - "[Vectara](https://vectara.com/) is the Agent Operating System for trusted enterprise AI: a unified Agentic RAG platform with built-in multi-modal retrieval, orchestration, and always-on governance. Deploy it on-prem (air-gapped), in your VPC, or as SaaS.\n", - "\n", - "Vectara provides a complete API-first platform for building production RAG and agentic applications:\n", - "\n", - "- **Simple Integration**: RESTful APIs and SDKs for Python, TypeScript, and Java make integration straightforward\n", - "- **Flexible Deployment**: Choose SaaS, VPC, or on-premises deployment based on your security and compliance requirements\n", - "- **Multi-Modal Support**: Index and search across text, tables, and images from various document formats\n", - "- **Advanced Retrieval**: Hybrid search combining semantic and keyword matching with multiple reranking options\n", - "- **Grounded Generation**: LLM responses with citations and factual consistency scores to reduce hallucinations\n", - "- **Enterprise-Ready**: Built-in access controls, audit logging, and compliance certifications (SOC2, HIPAA)" - ] + "source": "## About Vectara\n\n[Vectara](https://vectara.com/) is the Agent Platform for trusted enterprise AI: a unified Agentic RAG platform with built-in multi-modal retrieval, orchestration, and always-on governance. Deploy it on-prem (air-gapped), in your VPC, or as SaaS.\n\nVectara provides a complete API-first platform for building production RAG and agentic applications:\n\n- **Simple Integration**: RESTful APIs and SDKs for Python, TypeScript, and Java make integration straightforward\n- **Flexible Deployment**: Choose SaaS, VPC, or on-premises deployment based on your security and compliance requirements\n- **Multi-Modal Support**: Index and search across text, tables, and images from various document formats\n- **Advanced Retrieval**: Hybrid search combining semantic and keyword matching with multiple reranking options\n- **Grounded Generation**: LLM responses with citations and factual consistency scores to reduce hallucinations\n- **Enterprise-Ready**: Built-in access controls, audit logging, and compliance certifications (SOC2, HIPAA)" }, { "cell_type": "markdown", @@ -1075,4 +1062,4 @@ }, "nbformat": 4, "nbformat_minor": 5 -} +} \ No newline at end of file diff --git a/notebooks/api-examples/3-query-api.ipynb b/notebooks/api-examples/3-query-api.ipynb index b910996..587fb43 100644 --- a/notebooks/api-examples/3-query-api.ipynb +++ b/notebooks/api-examples/3-query-api.ipynb @@ -27,20 +27,7 @@ "cell_type": "markdown", "id": "db0855d0", "metadata": {}, - "source": [ - "## About Vectara\n", - "\n", - "[Vectara](https://vectara.com/) is the Agent Operating System for trusted enterprise AI: a unified Agentic RAG platform with built-in multi-modal retrieval, orchestration, and always-on governance. Deploy it on-prem (air-gapped), in your VPC, or as SaaS. Vectara agents deliver grounded answers and safe actions with source citations, step-level audit trails, fine-grained access controls, and real-time policy and factual-consistency enforcement, so teams ship faster with lower risk, and with trusted, production-grade AI agents at scale.\n", - "\n", - "Vectara provides a complete API-first platform for building production RAG and agentic applications:\n", - "\n", - "- **Simple Integration**: RESTful APIs and SDKs (Python, JavaScript) for quick integration into any stack\n", - "- **Flexible Deployment**: Choose SaaS, VPC, or on-premises deployment based on your requirements\n", - "- **Multi-Modal Support**: Index and search across text, tables, and images from PDFs, documents, and structured data\n", - "- **Advanced Retrieval**: Hybrid search combining semantic and keyword matching with state-of-the-art reranking\n", - "- **Grounded Generation**: LLM responses with citations and factual consistency scores to reduce hallucinations\n", - "- **Enterprise-Ready**: Built-in access controls, audit logging, and compliance (SOC2, HIPAA) from day one" - ] + "source": "## About Vectara\n\n[Vectara](https://vectara.com/) is the Agent Platform for trusted enterprise AI: a unified Agentic RAG platform with built-in multi-modal retrieval, orchestration, and always-on governance. Deploy it on-prem (air-gapped), in your VPC, or as SaaS. Vectara agents deliver grounded answers and safe actions with source citations, step-level audit trails, fine-grained access controls, and real-time policy and factual-consistency enforcement, so teams ship faster with lower risk, and with trusted, production-grade AI agents at scale.\n\nVectara provides a complete API-first platform for building production RAG and agentic applications:\n\n- **Simple Integration**: RESTful APIs and SDKs (Python, JavaScript) for quick integration into any stack\n- **Flexible Deployment**: Choose SaaS, VPC, or on-premises deployment based on your requirements\n- **Multi-Modal Support**: Index and search across text, tables, and images from PDFs, documents, and structured data\n- **Advanced Retrieval**: Hybrid search combining semantic and keyword matching with state-of-the-art reranking\n- **Grounded Generation**: LLM responses with citations and factual consistency scores to reduce hallucinations\n- **Enterprise-Ready**: Built-in access controls, audit logging, and compliance (SOC2, HIPAA) from day one" }, { "cell_type": "markdown", @@ -863,4 +850,4 @@ }, "nbformat": 4, "nbformat_minor": 5 -} +} \ No newline at end of file diff --git a/notebooks/api-examples/4-agent-api.ipynb b/notebooks/api-examples/4-agent-api.ipynb index b6cea07..9f44065 100644 --- a/notebooks/api-examples/4-agent-api.ipynb +++ b/notebooks/api-examples/4-agent-api.ipynb @@ -27,20 +27,7 @@ { "cell_type": "markdown", "metadata": {}, - "source": [ - "## About Vectara\n", - "\n", - "[Vectara](https://vectara.com/) is the Agent Operating System for trusted enterprise AI: a unified Agentic RAG platform with built-in multi-modal retrieval, orchestration, and always-on governance. Deploy it on-prem (air-gapped), in your VPC, or as SaaS. Vectara agents deliver grounded answers and safe actions with source citations, step-level audit trails, fine-grained access controls, and real-time policy and factual-consistency enforcement, so teams ship faster with lower risk, and with trusted, production-grade AI agents at scale.\n", - "\n", - "Vectara provides a complete API-first platform for building production RAG and agentic applications:\n", - "\n", - "- **Simple Integration**: RESTful APIs and SDKs (Python, JavaScript) for quick integration into any stack\n", - "- **Flexible Deployment**: Choose SaaS, VPC, or on-premises deployment based on your security requirements\n", - "- **Multi-Modal Support**: Index and search across text, tables, and images from PDFs, documents, and structured data\n", - "- **Advanced Retrieval**: Hybrid search combining semantic and keyword matching with state-of-the-art reranking\n", - "- **Grounded Generation**: LLM responses with citations and factual consistency scores to reduce hallucinations\n", - "- **Enterprise-Ready**: Built-in access controls, audit logging, and compliance (SOC2, HIPAA) from day one" - ] + "source": "## About Vectara\n\n[Vectara](https://vectara.com/) is the Agent Platform for trusted enterprise AI: a unified Agentic RAG platform with built-in multi-modal retrieval, orchestration, and always-on governance. Deploy it on-prem (air-gapped), in your VPC, or as SaaS. Vectara agents deliver grounded answers and safe actions with source citations, step-level audit trails, fine-grained access controls, and real-time policy and factual-consistency enforcement, so teams ship faster with lower risk, and with trusted, production-grade AI agents at scale.\n\nVectara provides a complete API-first platform for building production RAG and agentic applications:\n\n- **Simple Integration**: RESTful APIs and SDKs (Python, JavaScript) for quick integration into any stack\n- **Flexible Deployment**: Choose SaaS, VPC, or on-premises deployment based on your security requirements\n- **Multi-Modal Support**: Index and search across text, tables, and images from PDFs, documents, and structured data\n- **Advanced Retrieval**: Hybrid search combining semantic and keyword matching with state-of-the-art reranking\n- **Grounded Generation**: LLM responses with citations and factual consistency scores to reduce hallucinations\n- **Enterprise-Ready**: Built-in access controls, audit logging, and compliance (SOC2, HIPAA) from day one" }, { "cell_type": "markdown", @@ -527,4 +514,4 @@ }, "nbformat": 4, "nbformat_minor": 4 -} +} \ No newline at end of file diff --git a/notebooks/api-examples/7-lambda-tools-data-analysis.ipynb b/notebooks/api-examples/7-lambda-tools-data-analysis.ipynb index 8210282..bcb86e0 100644 --- a/notebooks/api-examples/7-lambda-tools-data-analysis.ipynb +++ b/notebooks/api-examples/7-lambda-tools-data-analysis.ipynb @@ -191,7 +191,7 @@ "output_type": "stream", "text": [ "Created tool 'statistical_analyzer'\n", - "Tool ID: tol_5332\n" + "Tool ID: tol_5809\n" ] } ], @@ -312,7 +312,7 @@ "source": [ "## Step 2: Create a Trend Analyzer Lambda Tool\n", "\n", - "This tool uses NumPy and Pandas for time-series analysis:\n", + "This tool uses NumPy for time-series analysis:\n", "- Moving averages (3 and 7 period windows)\n", "- Growth rates (period-over-period and total)\n", "- Trend detection (linear regression slope and direction)" @@ -329,14 +329,13 @@ "output_type": "stream", "text": [ "Created tool 'trend_analyzer'\n", - "Tool ID: tol_5333\n" + "Tool ID: tol_5810\n" ] } ], "source": [ "trend_analyzer_code = '''\n", "import json\n", - "import pandas as pd\n", "import numpy as np\n", "\n", "def process(\n", @@ -346,7 +345,7 @@ " analysis_type: str = \"all\"\n", ") -> dict:\n", " \"\"\"\n", - " Analyze trends in time-series data using NumPy and Pandas.\n", + " Analyze trends in time-series data using NumPy.\n", "\n", " Args:\n", " data: JSON string containing the time-series data\n", @@ -361,85 +360,92 @@ "\n", " try:\n", " parsed_data = json.loads(data)\n", - " df = pd.DataFrame(parsed_data)\n", " except Exception as e:\n", - " return {\"success\": False, \"error\": f\"Failed to parse data: {str(e)}\"}\n", + " return {\"success\": False, \"error\": \"Failed to parse data: \" + str(e)}\n", "\n", - " if date_column not in df.columns:\n", - " return {\"success\": False, \"error\": f\"Date column '{date_column}' not found. Available: {list(df.columns)}\"}\n", - " if value_column not in df.columns:\n", - " return {\"success\": False, \"error\": f\"Value column '{value_column}' not found. Available: {list(df.columns)}\"}\n", + " if not parsed_data:\n", + " return {\"success\": False, \"error\": \"Empty dataset\"}\n", + "\n", + " keys = list(parsed_data[0].keys())\n", + " if date_column not in keys:\n", + " return {\"success\": False, \"error\": \"Date column not found. Available: \" + str(keys)}\n", + " if value_column not in keys:\n", + " return {\"success\": False, \"error\": \"Value column not found. Available: \" + str(keys)}\n", + "\n", + " sorted_data = sorted(parsed_data, key=lambda row: row[date_column])\n", + " dates = [row[date_column] for row in sorted_data]\n", + " values = [float(row[value_column]) for row in sorted_data]\n", + " n = len(values)\n", "\n", - " # Sort by date column as string (works for ISO-format dates).\n", - " df = df.sort_values(date_column).reset_index(drop=True)\n", - " \n", - " values = df[value_column].astype(float)\n", " analyses = [a.strip().lower() for a in analysis_type.split(\",\")]\n", " do_all = \"all\" in analyses\n", "\n", - " # Moving Averages\n", " if do_all or \"moving_average\" in analyses:\n", " ma_results = {}\n", " for window in [3, 7]:\n", - " if len(values) >= window:\n", - " ma = values.rolling(window=window).mean()\n", - " ma_results[f\"ma_{window}\"] = {\n", - " \"latest\": float(ma.iloc[-1]) if not pd.isna(ma.iloc[-1]) else None,\n", - " \"values\": [float(v) if not pd.isna(v) else None for v in ma.tail(10)]\n", + " if n >= window:\n", + " ma = []\n", + " for i in range(window - 1, n):\n", + " avg = sum(values[i - window + 1:i + 1]) / window\n", + " ma.append(round(avg, 2))\n", + " ma_results[\"ma_\" + str(window)] = {\n", + " \"latest\": ma[-1],\n", + " \"values\": ma[-10:]\n", " }\n", " results[\"analysis\"][\"moving_averages\"] = ma_results\n", "\n", - " # Growth Rates\n", " if do_all or \"growth_rate\" in analyses:\n", - " pct_change = values.pct_change()\n", - " growth = {\n", - " \"period_over_period\": {\n", - " \"latest\": float(pct_change.iloc[-1]) if not pd.isna(pct_change.iloc[-1]) else None,\n", - " \"mean\": float(pct_change.mean()) if not pd.isna(pct_change.mean()) else None,\n", - " }\n", - " }\n", - " if values.iloc[0] != 0:\n", - " growth[\"total_growth\"] = float((values.iloc[-1] - values.iloc[0]) / values.iloc[0])\n", + " growth = {}\n", + " if n >= 2:\n", + " pct_changes = []\n", + " for i in range(1, n):\n", + " if values[i - 1] != 0:\n", + " pct_changes.append((values[i] - values[i - 1]) / values[i - 1])\n", + " if pct_changes:\n", + " growth[\"period_over_period\"] = {\n", + " \"latest\": round(pct_changes[-1], 4),\n", + " \"mean\": round(sum(pct_changes) / len(pct_changes), 4),\n", + " }\n", + " if values[0] != 0:\n", + " growth[\"total_growth\"] = round((values[-1] - values[0]) / values[0], 4)\n", " results[\"analysis\"][\"growth_rates\"] = growth\n", "\n", - " # Trend Detection (Linear Regression)\n", " if do_all or \"trend\" in analyses:\n", - " x = np.arange(len(values))\n", - " y = values.values\n", - " mask = ~np.isnan(y)\n", - " if mask.sum() > 1:\n", - " x_clean, y_clean = x[mask], y[mask]\n", - " slope, intercept = np.polyfit(x_clean, y_clean, 1)\n", - " y_pred = slope * x_clean + intercept\n", - " ss_res = np.sum((y_clean - y_pred) ** 2)\n", - " ss_tot = np.sum((y_clean - np.mean(y_clean)) ** 2)\n", - " r_squared = 1 - (ss_res / ss_tot) if ss_tot != 0 else 0\n", - "\n", - " mean_y = np.mean(y_clean)\n", - " if slope > 0.01 * mean_y:\n", - " direction = \"upward\"\n", - " elif slope < -0.01 * mean_y:\n", - " direction = \"downward\"\n", - " else:\n", - " direction = \"flat\"\n", - "\n", - " results[\"analysis\"][\"trend\"] = {\n", - " \"slope\": float(slope),\n", - " \"intercept\": float(intercept),\n", - " \"r_squared\": float(r_squared),\n", - " \"direction\": direction,\n", - " \"slope_per_period_pct\": float(slope / mean_y * 100) if mean_y != 0 else 0\n", - " }\n", + " x = list(range(n))\n", + " mean_x = sum(x) / n\n", + " mean_y = sum(values) / n\n", + " num = sum((x[i] - mean_x) * (values[i] - mean_y) for i in range(n))\n", + " den = sum((x[i] - mean_x) ** 2 for i in range(n))\n", + " slope = num / den if den != 0 else 0\n", + " intercept = mean_y - slope * mean_x\n", + " ss_res = sum((values[i] - (slope * x[i] + intercept)) ** 2 for i in range(n))\n", + " ss_tot = sum((values[i] - mean_y) ** 2 for i in range(n))\n", + " r_squared = 1 - (ss_res / ss_tot) if ss_tot != 0 else 0\n", + "\n", + " if mean_y != 0 and slope > 0.01 * mean_y:\n", + " direction = \"upward\"\n", + " elif mean_y != 0 and slope < -0.01 * mean_y:\n", + " direction = \"downward\"\n", + " else:\n", + " direction = \"flat\"\n", + "\n", + " results[\"analysis\"][\"trend\"] = {\n", + " \"slope\": round(slope, 2),\n", + " \"intercept\": round(intercept, 2),\n", + " \"r_squared\": round(r_squared, 4),\n", + " \"direction\": direction,\n", + " \"slope_per_period_pct\": round(slope / mean_y * 100, 2) if mean_y != 0 else 0\n", + " }\n", "\n", " results[\"summary\"] = {\n", - " \"start_date\": str(df[date_column].iloc[0]),\n", - " \"end_date\": str(df[date_column].iloc[-1]),\n", - " \"n_periods\": len(values),\n", - " \"start_value\": float(values.iloc[0]),\n", - " \"end_value\": float(values.iloc[-1]),\n", - " \"min\": float(values.min()),\n", - " \"max\": float(values.max()),\n", - " \"mean\": float(values.mean())\n", + " \"start_date\": dates[0],\n", + " \"end_date\": dates[-1],\n", + " \"n_periods\": n,\n", + " \"start_value\": values[0],\n", + " \"end_value\": values[-1],\n", + " \"min\": min(values),\n", + " \"max\": max(values),\n", + " \"mean\": round(sum(values) / n, 2)\n", " }\n", "\n", " return results\n", @@ -450,7 +456,7 @@ " \"language\": \"python\",\n", " \"name\": \"trend_analyzer\",\n", " \"title\": \"Trend Analyzer\",\n", - " \"description\": \"\"\"Analyze trends in time-series data using NumPy and Pandas.\n", + " \"description\": \"\"\"Analyze trends in time-series data using NumPy.\n", "\n", "Pass time-series data as JSON, specifying the date column and value column to analyze.\n", "Date column must be in ISO format (YYYY-MM-DD) for correct sorting.\n", @@ -491,7 +497,7 @@ "output_type": "stream", "text": [ "Created agent 'Data Analyst'\n", - "Agent Key: agt_data_analyst_70a4\n" + "Agent Key: agt_data_analyst_6505\n" ] } ], @@ -574,7 +580,7 @@ "name": "stdout", "output_type": "stream", "text": [ - "Session Created: ase_data_analysis_demo_20260223-090005_2633\n" + "Session Created: ase_data_analysis_demo_20260310-131055_d608\n" ] } ], @@ -686,45 +692,44 @@ " Params: {'columns': 'sales,profit_margin', 'operations': 'describe'}\n", "Tool Called: statistical_analyzer\n", " Params: {'columns': 'sales,units,profit_margin', 'operations': 'correlation'}\n", - " Output from statistical_analyzer: {\"success\": true, \"statistics\": {\"correlation\": {\"sales\": {\"sales\": 1.0, \"units\": -0.39664343302135424, \"profit_margin\": 0.9896428771138392}, \"units\": {\"sales\": -0.39664343302135424, \"units\": 1.0, \"profit_margin\": -0.48295207604868146}, \"profit_margin\": {\"sales\": 0.9896428771138392, \"units\": -0.48295207604868146, \"profit_margin\": 1.0}}}, \"metadata\": {\"rows\": 12, \"columns_analyzed\": [\"sales\", \"units\", \"profit_margin\"], \"operations_performed\": [\"correlation\"]}}\n", " Output from statistical_analyzer: {\"success\": true, \"statistics\": {\"describe\": {\"sales\": {\"count\": 12.0, \"mean\": 14916.666666666666, \"std\": 7412.622322563703, \"min\": 6000.0, \"25%\": 8750.0, \"50%\": 13500.0, \"75%\": 19750.0, \"max\": 28000.0}, \"profit_margin\": {\"count\": 12.0, \"mean\": 0.24916666666666668, \"std\": 0.09652680582317144, \"min\": 0.12, \"25%\": 0.1725, \"50%\": 0.235, \"75%\": 0.3275, \"max\": 0.4}}}, \"metadata\": {\"rows\": 12, \"columns_analyzed\": [\"sales\", \"profit_margin\"], \"operations_performed\": [\"describe\"]}}\n", + " Output from statistical_analyzer: {\"success\": true, \"statistics\": {\"correlation\": {\"sales\": {\"sales\": 1.0, \"units\": -0.39664343302135424, \"profit_margin\": 0.9896428771138392}, \"units\": {\"sales\": -0.39664343302135424, \"units\": 1.0, \"profit_margin\": -0.48295207604868146}, \"profit_margin\": {\"sales\": 0.9896428771138392, \"units\": -0.48295207604868146, \"profit_margin\": 1.0}}}, \"metadata\": {\"rows\": 12, \"columns_analyzed\": [\"sales\", \"units\", \"profit_margin\"], \"operations_performed\": [\"correlation\"]}}\n", "-------------------------\n", "\n", "Agent Response:\n", "\n", - "Here's the analysis of your sales data for the three products across four regions:\n", + "Here's an analysis of your sales data focusing on basic statistics and correlations:\n", "\n", "### Basic Statistics for Sales and Profit Margins\n", - "1. **Sales**:\n", - " - **Count**: 12 data points\n", - " - **Mean**: $14,917\n", - " - **Standard Deviation**: $7,413\n", - " - **Minimum**: $6,000\n", - " - **25th Percentile**: $8,750\n", - " - **Median (50th Percentile)**: $13,500\n", - " - **75th Percentile**: $19,750\n", - " - **Maximum**: $28,000\n", + "- **Sales**:\n", + " - **Mean (Average)**: $14,917\n", + " - **Standard Deviation**: $7,413 (indicating the variation or spread of the sales figures)\n", + " - **Minimum**: $6,000\n", + " - **25th Percentile**: $8,750\n", + " - **Median (50th Percentile)**: $13,500\n", + " - **75th Percentile**: $19,750\n", + " - **Maximum**: $28,000\n", "\n", - "2. **Profit Margin**:\n", - " - **Count**: 12 data points\n", - " - **Mean**: 0.249 (24.9%)\n", - " - **Standard Deviation**: 0.097 (9.7%)\n", - " - **Minimum**: 0.12 (12%)\n", - " - **25th Percentile**: 0.173 (17.3%)\n", - " - **Median (50th Percentile)**: 0.235 (23.5%)\n", - " - **75th Percentile**: 0.328 (32.8%)\n", - " - **Maximum**: 0.4 (40%)\n", + "- **Profit Margin**:\n", + " - **Mean (Average)**: 0.249 or 24.9%\n", + " - **Standard Deviation**: 0.0965 (indicating the variation in profit margins)\n", + " - **Minimum**: 0.12 or 12%\n", + " - **25th Percentile**: 0.1725 or 17.25%\n", + " - **Median (50th Percentile)**: 0.235 or 23.5%\n", + " - **75th Percentile**: 0.3275 or 32.75%\n", + " - **Maximum**: 0.4 or 40%\n", "\n", - "### Correlation between Sales, Units, and Profit Margin\n", - "- **Sales and Units**: -0.40 (Negative correlation indicating that as sales increase, the number of units slightly decreases, or vice versa)\n", - "- **Sales and Profit Margin**: 0.99 (Strong positive correlation suggesting that higher sales are associated with higher profit margins)\n", - "- **Units and Profit Margin**: -0.48 (Moderate negative correlation indicating that as the number of units increases, profit margins tend to decrease)\n", + "### Correlation Analysis\n", + "- **Sales and Profit Margin**: Strong positive correlation (0.99), indicating that as sales increase, profit margin tends to increase as well.\n", + "- **Sales and Units**: Moderate negative correlation (-0.40), suggesting that higher sales are not necessarily associated with a higher number of units sold, possibly due to price differences or varying sizes of sales orders.\n", + "- **Units and Profit Margin**: Moderate negative correlation (-0.48), implying that larger unit sales are often associated with lower profit margins, which might occur when discounts are applied to larger orders.\n", "\n", - "### Insights:\n", - "- **High Sales, High Profit Margin**: Products with higher sales tend to have higher profit margins. This could indicate effective pricing strategies or higher perceived value by consumers.\n", - "- **Sales and Units**: Interestingly, there's a slight inverse relationship between sales volumes and the number of units sold. This may suggest that premium pricing or larger bulk sales affect the figures.\n", + "### Interpretation\n", + "1. **High Correlation Between Sales and Profit Margin**: Focus on strategies that enhance sales as they likely improve profit margins proportionally.\n", + "2. **Sales and Units Relationship**: Consider analyzing pricing strategies or sales strategies that might be impacting unit sales differently than total sales value.\n", + "3. **Units Affecting Profit Margins**: Investigate if discounts or pricing strategies for bulk sales are affecting profitability negatively.\n", "\n", - "Consider exploring specific regional or product factors that might drive these strong correlations to understand strategic pricing or inventory management better.\n" + "For more insights, consider analyzing data by product or region to understand specific performance metrics and make targeted strategic decisions.\n" ] } ], @@ -786,26 +791,34 @@ "------ Agent Events ------\n", "Tool Called: trend_analyzer\n", " Params: {'date_column': 'month', 'value_column': 'revenue', 'analysis_type': 'all'}\n", - " Output from trend_analyzer: {\"error\": \"Python execution failed: Sandbox Python execution timed out\"}\n", + " Output from trend_analyzer: {\"success\": true, \"analysis\": {\"moving_averages\": {\"ma_3\": {\"latest\": 153666.67, \"values\": [101000.0, 105000.0, 109333.33, 118333.33, 121666.67, 127333.33, 133000.0, 138333.33, 145000.0, 153666.67]}, \"ma_7\": {\"latest\": 140714.29, \"values\": [111428.57, 116428.57, 121714.29, 127428.57, 133571.43, 140714.29]}}, \"growth_rates\": {\"period_over_period\": {\"latest\": 0.0839, \"mean\": 0.0502}, \"total_growth\": 0.68}, \"trend\": {\"slope\": 5860.14, \"intercept\": 94269.23, \"r_squared\": 0.9334, \"direction\": \"upward...\n", "-------------------------\n", "\n", "Agent Response:\n", "\n", - "It seems there was an issue processing the trend analysis request due to a timeout in the execution environment. However, I can guide you on how to perform this analysis or suggest solutions if there's specific assistance you need.\n", + "Here's a detailed analysis of your revenue trend for the year 2024:\n", "\n", - "Here's generally how to approach the analysis:\n", + "### Trend Analysis\n", + "1. **Trend Direction**:\n", + " - The revenue trend is **upward**. The linear regression analysis shows a strong positive trend with an R-squared value of 0.9334, indicating that the trend line fits the data very well.\n", "\n", - "1. **Trend Direction**: To find out if revenue is trending up, you can look at a simple linear regression line fit over your data points. If the slope is positive, revenue is trending up.\n", + "2. **Growth Rate**:\n", + " - **Total Growth**: From January to December, the revenue grew by **68%**.\n", + " - **Period-over-Period Growth**: The average monthly growth rate is approximately **5.02%**, with the latest monthly growth being **8.39%** from November to December.\n", "\n", - "2. **Growth Rate**: Calculate the percentage increase from one month to the next, as well as the overall increase from the start to end of the period.\n", + "3. **Moving Averages**:\n", + " - **3-Month Moving Average**: The latest value is **$153,667**, indicating a smoothing out of short-term fluctuations to reveal a more stable growth trend.\n", + " - **7-Month Moving Average**: The latest value is **$140,714**, providing an even smoother trend line over a longer period.\n", "\n", - "3. **Moving Averages**: Calculate the 3-month and 7-month moving averages to smooth out short-term fluctuations and highlight longer-term trends or cycles.\n", + "### Interpretation\n", + "- The data showcases a robust upward revenue growth throughout the year, with notable increases in recent months, suggesting strong year-end performance.\n", + "- The close fit of the data to the positive trend line, along with consistent month-over-month growth, indicates effective strategies contributing to revenue increase.\n", "\n", - "For accurate computations:\n", + "### Recommendations\n", + "- Consider maintaining the strategies driving the recent growth spikes, especially those showing strong impact in the final months of the year.\n", + "- Explore the factors contributing to periods of rapid growth around the middle and end of the year to replicate similar successes in the future.\n", "\n", - "- **Using Spreadsheets or Python**: You can use tools like Excel for simpler datasets or Python libraries like Pandas for more complex calculations, especially for moving averages and trend analysis.\n", - "\n", - "If you want me to attempt the trend analysis again or assist you differently, please let me know!\n" + "This analysis provides confidence in the positive momentum of your business's revenue trends, ensuring strategic planning can build on this upward trajectory.\n" ] } ], @@ -871,58 +884,61 @@ " Params: {'columns': 'revenue,customers', 'operations': 'correlation'}\n", "Tool Called: trend_analyzer\n", " Params: {'date_column': 'date', 'value_column': 'revenue', 'analysis_type': 'all'}\n", - " Output from statistical_analyzer: {\"success\": true, \"statistics\": {\"correlation\": {\"revenue\": {\"revenue\": 1.0, \"customers\": 0.9985504882556115}, \"customers\": {\"revenue\": 0.9985504882556115, \"customers\": 1.0}}}, \"metadata\": {\"rows\": 8, \"columns_analyzed\": [\"revenue\", \"customers\"], \"operations_performed\": [\"correlation\"]}}\n", " Output from statistical_analyzer: {\"success\": true, \"statistics\": {\"describe\": {\"revenue\": {\"count\": 8.0, \"mean\": 601250.0, \"std\": 119933.01701962046, \"min\": 450000.0, \"25%\": 510000.0, \"50%\": 595000.0, \"75%\": 667500.0, \"max\": 800000.0}, \"costs\": {\"count\": 8.0, \"mean\": 394375.0, \"std\": 55255.09026325086, \"min\": 320000.0, \"25%\": 353750.0, \"50%\": 395000.0, \"75%\": 427500.0, \"max\": 480000.0}, \"customers\": {\"count\": 8.0, \"mean\": 1818.75, \"std\": 487.66023007827897, \"min\": 1200.0, \"25%\": 1462.5, \"50%\": 1775.0, \"75%\": 2075.0, \"max\": 2650...\n", - " Output from trend_analyzer: {\"error\": \"Python execution failed: Sandbox Python execution timed out\"}\n", + " Output from statistical_analyzer: {\"success\": true, \"statistics\": {\"correlation\": {\"revenue\": {\"revenue\": 1.0, \"customers\": 0.9985504882556115}, \"customers\": {\"revenue\": 0.9985504882556115, \"customers\": 1.0}}}, \"metadata\": {\"rows\": 8, \"columns_analyzed\": [\"revenue\", \"customers\"], \"operations_performed\": [\"correlation\"]}}\n", + " Output from trend_analyzer: {\"success\": true, \"analysis\": {\"moving_averages\": {\"ma_3\": {\"latest\": 723333.33, \"values\": [483333.33, 536666.67, 570000.0, 613333.33, 650000.0, 723333.33]}, \"ma_7\": {\"latest\": 622857.14, \"values\": [572857.14, 622857.14]}}, \"growth_rates\": {\"period_over_period\": {\"latest\": 0.1111, \"mean\": 0.0876}, \"total_growth\": 0.7778}, \"trend\": {\"slope\": 47738.1, \"intercept\": 434166.67, \"r_squared\": 0.9506, \"direction\": \"upward\", \"slope_per_period_pct\": 7.94}}, \"summary\": {\"start_date\": \"2023-01-01\", \"end_dat...\n", "-------------------------\n", "\n", "Agent Response:\n", "\n", - "Here's the comprehensive analysis of your quarterly business performance:\n", + "### Comprehensive Quarterly Business Performance Analysis\n", "\n", - "### Statistical Summary\n", - "1. **Revenue**:\n", - " - **Count**: 8 quarters\n", - " - **Mean**: $601,250\n", - " - **Standard Deviation**: $119,933\n", - " - **Minimum**: $450,000\n", - " - **25th Percentile**: $510,000\n", - " - **Median (50th Percentile)**: $595,000\n", - " - **75th Percentile**: $667,500\n", - " - **Maximum**: $800,000\n", + "#### 1. Statistical Summary of Revenue, Costs, and Customers\n", + "- **Revenue**:\n", + " - **Mean (Average)**: $601,250\n", + " - **Standard Deviation**: $119,933 (indicating variation in revenue)\n", + " - **Minimum**: $450,000\n", + " - **25th Percentile**: $510,000\n", + " - **Median (50th Percentile)**: $595,000\n", + " - **75th Percentile**: $667,500\n", + " - **Maximum**: $800,000\n", "\n", - "2. **Costs**:\n", - " - **Count**: 8 quarters\n", - " - **Mean**: $394,375\n", - " - **Standard Deviation**: $55,255\n", - " - **Minimum**: $320,000\n", - " - **25th Percentile**: $353,750\n", - " - **Median (50th Percentile)**: $395,000\n", - " - **75th Percentile**: $427,500\n", - " - **Maximum**: $480,000\n", + "- **Costs**:\n", + " - **Mean (Average)**: $394,375\n", + " - **Standard Deviation**: $55,255\n", + " - **Minimum**: $320,000\n", + " - **25th Percentile**: $353,750\n", + " - **Median (50th Percentile)**: $395,000\n", + " - **75th Percentile**: $427,500\n", + " - **Maximum**: $480,000\n", "\n", - "3. **Customers**:\n", - " - **Count**: 8 quarters\n", - " - **Mean**: 1,818.75 customers\n", - " - **Standard Deviation**: 487.66\n", - " - **Minimum**: 1,200\n", - " - **25th Percentile**: 1,462.5\n", - " - **Median (50th Percentile)**: 1,775\n", - " - **75th Percentile**: 2,075\n", - " - **Maximum**: 2,650\n", + "- **Customers**:\n", + " - **Mean (Average)**: 1,819\n", + " - **Standard Deviation**: 487.66\n", + " - **Minimum**: 1,200\n", + " - **25th Percentile**: 1,462.5\n", + " - **Median (50th Percentile)**: 1,775\n", + " - **75th Percentile**: 2,075\n", + " - **Maximum**: 2,650\n", "\n", - "### Correlation Analysis\n", - "- **Revenue and Customers**: 0.999 (very strong positive correlation). This suggests that increases in the number of customers are closely associated with increases in revenue.\n", + "#### 2. Correlation Analysis\n", + "- There is a very high positive correlation between **revenue and customers** (0.999), which indicates that increases in the number of customers are strongly associated with revenue increases.\n", "\n", - "### Trend Analysis on Revenue\n", - "- **Trend Direction & Growth Trajectory**: Unfortunately, the trend analysis could not be completed due to an execution timeout. However, based on the data you provided, there's a noticeable increasing trend in revenue from 2023 to 2024, with revenue peaking in 2024-Q4.\n", - "- **Moving Averages & Growth**: As a follow-up, you might consider using spreadsheet tools or Python scripts to calculate quarterly moving averages and growth rates, helping smooth out fluctuations and understand growth rates quarterly.\n", + "#### 3. Trend Analysis on Revenue\n", + "- **Overall Trend**: The revenue is on an **upward trend**, supported by an R-squared value of 0.9506, signifying a strong correlation with increasing time.\n", + "- **Growth Rates**:\n", + " - **Total Growth**: The revenue grew by **77.78%** over the analyzed period.\n", + " - **Average Period-over-Period Growth**: Approximately **8.76%** per quarter.\n", + "- **Moving Averages**:\n", + " - **3-Quarter Moving Average**: The latest value is **$723,333**, indicating robust growth over recent quarters.\n", + " - **7-Quarter Moving Average**: The latest value is **$622,857**, confirming a steady trend over a longer period.\n", "\n", - "### Insights:\n", - "- There's a solid upward trend in both revenue and customers, with very high correlation.\n", - "- Costs are also rising, but not as sharply as revenue, which is beneficial for profit margins.\n", + "### Interpretation & Recommendations\n", + "- The data indicates a consistent upward trend in both revenue and the number of customers, reflecting successful growth strategies.\n", + "- The strong correlation between customers and revenue suggests focusing on customer acquisition and retention strategies to drive further revenue growth.\n", + "- Given the upward trajectory, consider investing in initiatives that sustain this growth momentum, such as marketing and customer loyalty programs.\n", "\n", - "If you need further assistance or help setting up the trend analysis in another environment, feel free to let me know!\n" + "This analysis showcases a healthy business performance with strong growth potential. Make strategic use of these insights to capitalize on emerging opportunities and mitigate potential risks.\n" ] } ], @@ -976,9 +992,9 @@ "name": "stdout", "output_type": "stream", "text": [ - "Deleted agent: agt_data_analyst_70a4\n", - "Deleted tool: statistical_analyzer (tol_5332)\n", - "Deleted tool: trend_analyzer (tol_5333)\n" + "Deleted agent: agt_data_analyst_6505\n", + "Deleted tool: statistical_analyzer (tol_5809)\n", + "Deleted tool: trend_analyzer (tol_5810)\n" ] } ], diff --git a/notebooks/api-examples/8-reranker-instructions.ipynb b/notebooks/api-examples/8-reranker-instructions.ipynb new file mode 100644 index 0000000..12c6f92 --- /dev/null +++ b/notebooks/api-examples/8-reranker-instructions.ipynb @@ -0,0 +1,187 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "id": "colab-badge", + "metadata": {}, + "source": [ + "\"Open" + ] + }, + { + "cell_type": "markdown", + "id": "title", + "metadata": {}, + "source": "# Vectara Reranker Instructions with Qwen3\n\nIn this notebook we demonstrate how to use **reranker instructions** with Vectara's `qwen3-reranker`. Reranker instructions let you pass domain-specific guidance to the reranker so it can better score relevance for your particular use case.\n\nWe'll cover:\n- Baseline query using qwen3-reranker **without** instructions\n- **Role-based intent steering**: using instructions to shift results toward practical docs vs academic papers\n- **Abbreviation and jargon resolution**: using a glossary to help the reranker understand domain-specific terms" + }, + { + "cell_type": "markdown", + "id": "about-vectara", + "metadata": {}, + "source": "## About Vectara\n\n[Vectara](https://vectara.com/) is the Agent Platform for trusted enterprise AI: a unified Agentic RAG platform with built-in multi-modal retrieval, orchestration, and always-on governance. Deploy it on-prem (air-gapped), in your VPC, or as SaaS. Vectara agents deliver grounded answers and safe actions with source citations, step-level audit trails, fine-grained access controls, and real-time policy and factual-consistency enforcement, so teams ship faster with lower risk, and with trusted, production-grade AI agents at scale.\n\nVectara provides a complete API-first platform for building production RAG and agentic applications:\n\n- **Simple Integration**: RESTful APIs and SDKs (Python, JavaScript) for quick integration into any stack\n- **Flexible Deployment**: Choose SaaS, VPC, or on-premises deployment based on your requirements\n- **Multi-Modal Support**: Index and search across text, tables, and images from PDFs, documents, and structured data\n- **Advanced Retrieval**: Hybrid search combining semantic and keyword matching with state-of-the-art reranking\n- **Grounded Generation**: LLM responses with citations and factual consistency scores to reduce hallucinations\n- **Enterprise-Ready**: Built-in access controls, audit logging, and compliance (SOC2, HIPAA) from day one" + }, + { + "cell_type": "markdown", + "id": "getting-started", + "metadata": {}, + "source": [ + "## Getting Started\n", + "\n", + "This notebook assumes you've completed Notebooks 1 and 2:\n", + "- Notebook 1: Created two corpora (ai-research-papers and vectara-docs) with Boomerang embeddings\n", + "- Notebook 2: Ingested AI research papers and Vectara documentation" + ] + }, + { + "cell_type": "code", + "execution_count": 1, + "id": "setup", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Setup complete.\n" + ] + } + ], + "source": [ + "import os\n", + "import requests\n", + "import json\n", + "\n", + "# Set up authentication\n", + "api_key = os.environ['VECTARA_API_KEY']\n", + "\n", + "# Corpus keys from Notebook 1\n", + "research_corpus_key = 'tutorial-ai-research-papers'\n", + "docs_corpus_key = 'tutorial-vectara-docs'\n", + "\n", + "# Base URL for Vectara API v2\n", + "BASE_URL = \"https://api.vectara.io/v2\"\n", + "\n", + "# Common headers for all requests\n", + "headers = {\n", + " 'Content-Type': 'application/json',\n", + " 'Accept': 'application/json',\n", + " 'x-api-key': api_key\n", + "}\n", + "\n", + "\n", + "def run_query(query_request):\n", + " \"\"\"Run a Vectara query and print the summary and top results.\"\"\"\n", + " response = requests.post(f\"{BASE_URL}/query\", headers=headers, json=query_request)\n", + " if response.status_code != 200:\n", + " print(f\"Error: {response.status_code}\")\n", + " print(response.text)\n", + " return None\n", + "\n", + " result = response.json()\n", + " print(\"\\n=== Generated Summary ===\")\n", + " print(result['summary'])\n", + " print(f\"\\n=== Factual Consistency Score: {result.get('factual_consistency_score', 'N/A')} ===\")\n", + "\n", + " print(\"\\n=== Top Search Results ===\")\n", + " for i, sr in enumerate(result.get('search_results', [])[:5], 1):\n", + " meta = sr.get('document_metadata', {})\n", + " print(f\"\\n--- Result {i} (score: {sr.get('score', 'N/A'):.4f}) ---\")\n", + " print(f\"Document: {sr.get('document_id', 'N/A')}\")\n", + " print(f\"Title: {meta.get('title', 'N/A')}\")\n", + " print(f\"Text: {sr['text'][:200]}...\")\n", + "\n", + " return result\n", + "\n", + "\n", + "print(\"Setup complete.\")" + ] + }, + { + "cell_type": "markdown", + "id": "example1-header", + "metadata": {}, + "source": "## Example 1: Baseline Query — Qwen3 Reranker Without Instructions\n\nFirst, let's run a query using the `qwen3-reranker` **without** any instructions across **both corpora** (research papers and Vectara docs). This establishes a baseline to compare against in the following examples.\n\nNote the API difference from earlier notebooks: instead of `reranker_id`, we use `reranker_name` to select the qwen3 reranker." + }, + { + "cell_type": "code", + "execution_count": null, + "id": "example1-code", + "metadata": {}, + "outputs": [], + "source": "QUERY = \"How does reranking improve search result quality?\"\n\nbaseline_request = {\n \"query\": QUERY,\n \"search\": {\n \"corpora\": [\n {\n \"corpus_key\": research_corpus_key,\n \"lexical_interpolation\": 0.005\n },\n {\n \"corpus_key\": docs_corpus_key,\n \"lexical_interpolation\": 0.005\n }\n ],\n \"limit\": 100,\n \"context_configuration\": {\n \"sentences_before\": 2,\n \"sentences_after\": 2\n },\n \"reranker\": {\n \"type\": \"chain\",\n \"rerankers\": [\n {\n \"type\": \"customer_reranker\",\n \"reranker_name\": \"qwen3-reranker\",\n \"limit\": 100\n },\n {\n \"type\": \"mmr\",\n \"diversity_bias\": 0.05\n }\n ]\n }\n },\n \"generation\": {\n \"generation_preset_name\": \"vectara-summary-ext-24-05-med-omni\",\n \"max_used_search_results\": 10,\n \"response_language\": \"eng\",\n \"enable_factual_consistency_score\": True\n }\n}\n\nprint(\"=\" * 60)\nprint(\"BASELINE: qwen3-reranker without instructions\")\nprint(\"=\" * 60)\nbaseline_result = run_query(baseline_request)" + }, + { + "cell_type": "markdown", + "id": "example2-header", + "metadata": {}, + "source": "## Example 2: Role-Based Intent Steering\n\nNow we use the **same query** with and without instructions, searching **both corpora**. The two corpus types (academic papers vs Vectara product docs) create a natural contrast.\n\nWithout instructions, the reranker returns a mix of results from both corpora. With instructions that describe the user as a developer building a production application, we expect the reranker to prioritize practical Vectara documentation over academic papers." + }, + { + "cell_type": "code", + "execution_count": null, + "id": "example2-code", + "metadata": {}, + "outputs": [], + "source": "# First: query WITHOUT instructions (same query as baseline, same corpora)\nno_instructions_request = {\n \"query\": QUERY,\n \"search\": {\n \"corpora\": [\n {\n \"corpus_key\": research_corpus_key,\n \"lexical_interpolation\": 0.005\n },\n {\n \"corpus_key\": docs_corpus_key,\n \"lexical_interpolation\": 0.005\n }\n ],\n \"limit\": 100,\n \"context_configuration\": {\n \"sentences_before\": 2,\n \"sentences_after\": 2\n },\n \"reranker\": {\n \"type\": \"chain\",\n \"rerankers\": [\n {\n \"type\": \"customer_reranker\",\n \"reranker_name\": \"qwen3-reranker\",\n \"limit\": 100,\n \"cutoff\": 0.2\n },\n {\n \"type\": \"mmr\",\n \"diversity_bias\": 0.05\n }\n ]\n }\n },\n \"generation\": {\n \"generation_preset_name\": \"vectara-summary-ext-24-05-med-omni\",\n \"max_used_search_results\": 10,\n \"response_language\": \"eng\",\n \"enable_factual_consistency_score\": True\n }\n}\n\nprint(\"=\" * 60)\nprint(\"WITHOUT INSTRUCTIONS: mixed results from both corpora\")\nprint(\"=\" * 60)\nno_instructions_result = run_query(no_instructions_request)\n\n# Now: same query WITH role-based instructions\nROLE_INSTRUCTIONS = (\n \"The user is a software developer building a production search application. \"\n \"Prioritize practical implementation guides, API documentation, and \"\n \"configuration options over academic research papers and theoretical analysis.\"\n)\n\nwith_instructions_request = {\n \"query\": QUERY,\n \"search\": {\n \"corpora\": [\n {\n \"corpus_key\": research_corpus_key,\n \"lexical_interpolation\": 0.005\n },\n {\n \"corpus_key\": docs_corpus_key,\n \"lexical_interpolation\": 0.005\n }\n ],\n \"limit\": 100,\n \"context_configuration\": {\n \"sentences_before\": 2,\n \"sentences_after\": 2\n },\n \"reranker\": {\n \"type\": \"chain\",\n \"rerankers\": [\n {\n \"type\": \"customer_reranker\",\n \"reranker_name\": \"qwen3-reranker\",\n \"instructions\": ROLE_INSTRUCTIONS,\n \"limit\": 100,\n \"cutoff\": 0.2\n },\n {\n \"type\": \"mmr\",\n \"diversity_bias\": 0.05\n }\n ]\n }\n },\n \"generation\": {\n \"generation_preset_name\": \"vectara-summary-ext-24-05-med-omni\",\n \"max_used_search_results\": 10,\n \"response_language\": \"eng\",\n \"enable_factual_consistency_score\": True\n }\n}\n\nprint(\"\\n\" + \"=\" * 60)\nprint(\"WITH INSTRUCTIONS: prioritize practical docs for developers\")\nprint(\"=\" * 60)\nwith_instructions_result = run_query(with_instructions_request)" + }, + { + "cell_type": "code", + "execution_count": null, + "id": "example2-compare", + "metadata": {}, + "outputs": [], + "source": "# Compare: count papers vs docs in top-5 for both variants\ndef classify_result(doc_id):\n \"\"\"Classify a result as 'paper' or 'docs' based on document ID.\"\"\"\n if doc_id.endswith('.pdf'):\n return 'paper'\n return 'docs'\n\nif no_instructions_result and with_instructions_result:\n print(\"=== Role-Based Intent Steering Comparison ===\\n\")\n\n for label, result in [(\"Without instructions\", no_instructions_result),\n (\"With instructions\", with_instructions_result)]:\n top5 = result.get('search_results', [])[:5]\n papers = sum(1 for sr in top5 if classify_result(sr['document_id']) == 'paper')\n docs = sum(1 for sr in top5 if classify_result(sr['document_id']) == 'docs')\n print(f\"{label} — top-5 breakdown: {papers} papers, {docs} docs\")\n for i, sr in enumerate(top5, 1):\n kind = classify_result(sr['document_id'])\n print(f\" {i}. [{kind:5s}] {sr['document_id']} (score: {sr.get('score', 0):.4f})\")\n print()\n\n print(\"-> With developer-focused instructions, Vectara docs should dominate the top results.\")" + }, + { + "cell_type": "markdown", + "id": "example3-header", + "metadata": {}, + "source": "## Example 3: Abbreviation and Jargon Resolution\n\nOne of the most powerful uses of reranker instructions is **resolving domain-specific abbreviations and jargon**. In specialized domains, queries often use abbreviations that don't appear literally in the documents. Instructions help the reranker bridge this gap.\n\nThe key is using a **terse query with minimal context clues** — if the query itself explains the abbreviation (e.g., \"How does FCS help detect hallucinations in LLM outputs?\"), the reranker can figure it out from surrounding words alone. A short, opaque query like `\"HHEM accuracy and benchmarks\"` genuinely requires the glossary." + }, + { + "cell_type": "code", + "execution_count": null, + "id": "example3-code", + "metadata": {}, + "outputs": [], + "source": "# A terse query using a domain abbreviation with NO context clues.\n# \"HHEM\" alone gives the reranker nothing to work with.\nABBREV_QUERY = \"HHEM accuracy and benchmarks\"\n\n# Without instructions — the reranker treats \"HHEM\" as an opaque string\nno_glossary_request = {\n \"query\": ABBREV_QUERY,\n \"search\": {\n \"corpora\": [\n {\n \"corpus_key\": research_corpus_key,\n \"lexical_interpolation\": 0.005\n },\n {\n \"corpus_key\": docs_corpus_key,\n \"lexical_interpolation\": 0.005\n }\n ],\n \"limit\": 100,\n \"context_configuration\": {\n \"sentences_before\": 2,\n \"sentences_after\": 2\n },\n \"reranker\": {\n \"type\": \"chain\",\n \"rerankers\": [\n {\n \"type\": \"customer_reranker\",\n \"reranker_name\": \"qwen3-reranker\",\n \"limit\": 100,\n \"cutoff\": 0.2\n },\n {\n \"type\": \"mmr\",\n \"diversity_bias\": 0.05\n }\n ]\n }\n },\n \"generation\": {\n \"generation_preset_name\": \"vectara-summary-ext-24-05-med-omni\",\n \"max_used_search_results\": 10,\n \"response_language\": \"eng\",\n \"enable_factual_consistency_score\": True\n }\n}\n\nprint(\"=\" * 60)\nprint(\"WITHOUT GLOSSARY: 'HHEM' is opaque to the reranker\")\nprint(\"=\" * 60)\nno_glossary_result = run_query(no_glossary_request)" + }, + { + "cell_type": "code", + "execution_count": null, + "id": "example3-with-glossary", + "metadata": {}, + "outputs": [], + "source": "# With glossary instructions — tell the reranker what HHEM means\nGLOSSARY_INSTRUCTIONS = (\n \"This corpus contains Vectara platform documentation and AI research papers. \"\n \"HHEM stands for Hughes Hallucination Evaluation Model, Vectara's model for \"\n \"evaluating factual consistency of LLM-generated text. \"\n \"Prioritize results about hallucination detection, factual grounding, \"\n \"and evaluation metrics.\"\n)\n\nwith_glossary_request = {\n \"query\": ABBREV_QUERY,\n \"search\": {\n \"corpora\": [\n {\n \"corpus_key\": research_corpus_key,\n \"lexical_interpolation\": 0.005\n },\n {\n \"corpus_key\": docs_corpus_key,\n \"lexical_interpolation\": 0.005\n }\n ],\n \"limit\": 100,\n \"context_configuration\": {\n \"sentences_before\": 2,\n \"sentences_after\": 2\n },\n \"reranker\": {\n \"type\": \"chain\",\n \"rerankers\": [\n {\n \"type\": \"customer_reranker\",\n \"reranker_name\": \"qwen3-reranker\",\n \"instructions\": GLOSSARY_INSTRUCTIONS,\n \"limit\": 100,\n \"cutoff\": 0.2\n },\n {\n \"type\": \"mmr\",\n \"diversity_bias\": 0.05\n }\n ]\n }\n },\n \"generation\": {\n \"generation_preset_name\": \"vectara-summary-ext-24-05-med-omni\",\n \"max_used_search_results\": 10,\n \"response_language\": \"eng\",\n \"enable_factual_consistency_score\": True\n }\n}\n\nprint(\"=\" * 60)\nprint(\"WITH GLOSSARY: reranker knows HHEM = Hughes Hallucination Evaluation Model\")\nprint(\"=\" * 60)\nwith_glossary_result = run_query(with_glossary_request)" + }, + { + "cell_type": "code", + "execution_count": null, + "id": "example3-compare", + "metadata": {}, + "outputs": [], + "source": "# Compare results: score changes and document reordering\nif no_glossary_result and with_glossary_result:\n print(\"=== Abbreviation Resolution Comparison ===\")\n print(f\"\\nQuery: \\\"{ABBREV_QUERY}\\\"\")\n print(f\"\\nWithout glossary — result count: {len(no_glossary_result.get('search_results', []))}\")\n print(f\"With glossary — result count: {len(with_glossary_result.get('search_results', []))}\")\n\n print(\"\\nWithout glossary — top-5 docs:\")\n for i, sr in enumerate(no_glossary_result.get('search_results', [])[:5], 1):\n meta = sr.get('document_metadata', {})\n print(f\" {i}. {sr['document_id']} (score: {sr.get('score', 0):.4f}) — {meta.get('title', 'N/A')}\")\n\n print(\"\\nWith glossary — top-5 docs:\")\n for i, sr in enumerate(with_glossary_result.get('search_results', [])[:5], 1):\n meta = sr.get('document_metadata', {})\n print(f\" {i}. {sr['document_id']} (score: {sr.get('score', 0):.4f}) — {meta.get('title', 'N/A')}\")\n\n # Check for hallucination-related content surfacing with glossary\n def has_hallucination_content(sr):\n doc_id = sr.get('document_id', '').lower()\n title = sr.get('document_metadata', {}).get('title', '').lower()\n text = sr.get('text', '').lower()\n keywords = ['hallucination', 'factual', 'consistency', 'hhem', 'grounding']\n return any(kw in doc_id or kw in title or kw in text for kw in keywords)\n\n no_gloss_relevant = sum(1 for sr in no_glossary_result.get('search_results', [])[:5]\n if has_hallucination_content(sr))\n with_gloss_relevant = sum(1 for sr in with_glossary_result.get('search_results', [])[:5]\n if has_hallucination_content(sr))\n\n print(f\"\\nHallucination-related results in top-5:\")\n print(f\" Without glossary: {no_gloss_relevant}\")\n print(f\" With glossary: {with_gloss_relevant}\")\n print(\"\\n-> The glossary helps the reranker connect 'HHEM' to hallucination evaluation content.\")" + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3 (ipykernel)", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.12.2" + } + }, + "nbformat": 4, + "nbformat_minor": 5 +} diff --git a/notebooks/api-examples/README.md b/notebooks/api-examples/README.md index c1cd674..228292c 100644 --- a/notebooks/api-examples/README.md +++ b/notebooks/api-examples/README.md @@ -4,7 +4,7 @@ This tutorial series provides a comprehensive, hands-on introduction to building ## About Vectara -[Vectara](https://vectara.com/) is the Agent Operating System for trusted enterprise AI: a unified Agentic RAG platform with built-in multi-modal retrieval, orchestration, and always-on governance. Deploy it on-prem (air-gapped), in your VPC, or as SaaS. +[Vectara](https://vectara.com/) is the Agent Platform for trusted enterprise AI: a unified Agentic RAG platform with built-in multi-modal retrieval, orchestration, and always-on governance. Deploy it on-prem (air-gapped), in your VPC, or as SaaS. Key features: - **Simple Integration**: RESTful APIs and SDKs for Python, TypeScript, and Java From 4e17883e9d448e047953e1d295ff920e6a042857 Mon Sep 17 00:00:00 2001 From: Ofer Mendelevitch Date: Tue, 10 Mar 2026 22:53:44 -0700 Subject: [PATCH 5/6] minor updates --- README.md | 13 +++++++++++++ .../api-examples/7-lambda-tools-data-analysis.ipynb | 8 ++------ 2 files changed, 15 insertions(+), 6 deletions(-) diff --git a/README.md b/README.md index 382853f..8017dc8 100644 --- a/README.md +++ b/README.md @@ -2,3 +2,16 @@ This repository contains example code for Vectara. * Notebooks used in our blog posts * Examples for how to use Vectara with LlamaIndex, LangChain and DSPy. + +## API Examples Tutorial Series + +A step-by-step tutorial series in `notebooks/api-examples/` covering the Vectara API v2: + +1. [Corpus Creation](notebooks/api-examples/1-corpus-creation.ipynb) — Create and configure corpora +2. [Data Ingestion](notebooks/api-examples/2-data-ingestion.ipynb) — Upload and index documents +3. [Query API](notebooks/api-examples/3-query-api.ipynb) — Search, retrieval, and generation +4. [Agent API](notebooks/api-examples/4-agent-api.ipynb) — Build RAG agents +5. [Sub-Agents](notebooks/api-examples/5-sub-agents.ipynb) — Multi-agent orchestration +6. [Artifacts](notebooks/api-examples/6-artifacts.ipynb) — Working with artifacts +7. [Lambda Tools for Data Analysis](notebooks/api-examples/7-lambda-tools-data-analysis.ipynb) — NumPy/Pandas lambda tools for agent data analysis +8. [Reranker Instructions](notebooks/api-examples/8-reranker-instructions.ipynb) — Using reranker instructions with qwen3-reranker for role-based intent steering and jargon resolution diff --git a/notebooks/api-examples/7-lambda-tools-data-analysis.ipynb b/notebooks/api-examples/7-lambda-tools-data-analysis.ipynb index bcb86e0..d45e1c0 100644 --- a/notebooks/api-examples/7-lambda-tools-data-analysis.ipynb +++ b/notebooks/api-examples/7-lambda-tools-data-analysis.ipynb @@ -564,11 +564,7 @@ "cell_type": "markdown", "id": "cell-15", "metadata": {}, - "source": [ - "## Step 5: Create a Session and Test the Agent\n", - "\n", - "Let's create a session and test the data analyst agent with sample datasets." - ] + "source": "## Step 4: Create a Session and Test the Agent\n\nLet's create a session and test the data analyst agent with sample datasets." }, { "cell_type": "code", @@ -1042,4 +1038,4 @@ }, "nbformat": 4, "nbformat_minor": 5 -} +} \ No newline at end of file From c6e24202074d2e968bb6c68d84bbf622b73a7b46 Mon Sep 17 00:00:00 2001 From: Ofer Mendelevitch Date: Wed, 11 Mar 2026 06:14:29 -0700 Subject: [PATCH 6/6] updated README --- notebooks/api-examples/README.md | 34 ++++++++++++++++++++++++++++---- 1 file changed, 30 insertions(+), 4 deletions(-) diff --git a/notebooks/api-examples/README.md b/notebooks/api-examples/README.md index 228292c..968d57a 100644 --- a/notebooks/api-examples/README.md +++ b/notebooks/api-examples/README.md @@ -1,6 +1,6 @@ # Vectara API Tutorial Series -This tutorial series provides a comprehensive, hands-on introduction to building RAG (Retrieval-Augmented Generation) applications using Vectara's REST API. Through seven progressive notebooks, you'll learn to create corpora, ingest data, query information, build intelligent AI agents, orchestrate multi-agent workflows, work with file artifacts, and create data analysis tools with NumPy and Pandas. +This tutorial series provides a comprehensive, hands-on introduction to building RAG (Retrieval-Augmented Generation) applications using Vectara's REST API. Through eight progressive notebooks, you'll learn to create corpora, ingest data, query information, build intelligent AI agents, orchestrate multi-agent workflows, work with file artifacts, create data analysis tools with NumPy and Pandas, and use reranker instructions for domain-specific relevance tuning. ## About Vectara @@ -290,10 +290,9 @@ A **Document Analyst** agent that can: - Combine multiple data analysis tools in agent workflows **What you'll build:** -Three **Data Analysis Lambda Tools**: +Two **Data Analysis Lambda Tools**: 1. **Statistical Analyzer**: Descriptive statistics, correlations, percentiles using Pandas 2. **Trend Analyzer**: Moving averages, growth rates, linear regression using NumPy -3. **Data Transformer**: Normalization, missing value handling, outlier removal, aggregation **Lambda tool configuration:** ```python @@ -304,6 +303,7 @@ tool_config = { "title": "Statistical Analyzer", "description": "Compute statistics on tabular data using Pandas...", "code": """ +import json import pandas as pd import numpy as np @@ -329,6 +329,28 @@ def process(data: str, columns: str = "", operations: str = "describe") -> dict: --- +### [Notebook 8: Reranker Instructions](8-reranker-instructions.ipynb) + +**What you'll learn:** +- Use reranker instructions with `qwen3-reranker` to guide relevance scoring +- Implement role-based intent steering to prioritize practical docs over academic papers +- Create domain-specific glossaries to resolve abbreviations and jargon +- Compare baseline reranking with instruction-guided reranking + +**What you'll build:** +Three query examples demonstrating: +1. **Baseline**: `qwen3-reranker` without instructions across both corpora +2. **Role-based intent steering**: Instructions that prioritize practical Vectara docs for a developer audience +3. **Abbreviation resolution**: A glossary that helps the reranker understand "HHEM" means Hughes Hallucination Evaluation Model + +**Key concepts:** +- **Reranker instructions**: A text parameter that provides domain context to guide the reranker's scoring +- **`reranker_name` vs `reranker_id`**: Notebook 8 uses `reranker_name: "qwen3-reranker"` (by name) rather than `reranker_id` (by ID) as in earlier notebooks +- **Intent steering**: Shift result rankings toward a specific user persona without changing the query +- **Jargon resolution**: Help the reranker bridge the gap between abbreviations in queries and full terms in documents + +--- + ## Tutorial Flow ``` @@ -359,6 +381,10 @@ def process(data: str, columns: str = "", operations: str = "describe") -> dict: 7. Lambda Tools for Data Analysis ↓ Build NumPy/Pandas-powered data analysis tools + +8. Reranker Instructions + ↓ + Guide relevance scoring with domain-specific instructions ``` ## Running the Notebooks @@ -400,7 +426,7 @@ jupyter notebook | `POST /v2/corpora/{key}/upload_file` | Upload files | 2 | | `POST /v2/corpora/{key}/documents` | Index documents | 2 | | `GET /v2/corpora/{key}/documents` | List documents | 2 | -| `POST /v2/query` | Query corpora | 3 | +| `POST /v2/query` | Query corpora | 3, 8 | | `POST /v2/agents` | Create agent | 4, 5, 6, 7 | | `POST /v2/agents/{key}/sessions` | Create session | 4, 5, 6, 7 | | `POST /v2/agents/{key}/sessions/{key}/events` | Send messages / Upload artifacts | 4, 5, 6, 7 |