Skip to content

aws-samples/sample-dat406-build-agentic-ai-powered-search-apg

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

DAT406 - Build Agentic AI-Powered Search with Amazon Aurora PostgreSQL

AWS re:Invent 2025 Workshop Level Duration

Platform & Infrastructure

AWS Aurora pgvector Bedrock

Languages & Frameworks

Python React TypeScript FastAPI

Architecture & Capabilities

Architecture Search AI MCP

License Strands SDK


⚠️ Educational Workshop: This repository contains demonstration code for AWS re:Invent 2025. Not intended for production deployment without proper security hardening and testing.


πŸš€ Quick Start

Workshop Duration: 2 hours | Hands-on: Parts 1 & 3 (50 min) | Guided Demo: Part 2 (20 min) | Optional: Part 4 (Self-paced)

Build enterprise-grade agentic AI applications with semantic search, multi-agent orchestration, and Model Context Protocol integration. Leverage Amazon Aurora PostgreSQL 17.5 with pgvector 0.8.0, Amazon Bedrock (Claude Sonnet 4 + Titan Text Embeddings v2), and modern full-stack technologies.

Pre-configured Workshop Environment

start-backend   # Terminal 1: FastAPI backend (port 8000)
start-frontend  # Terminal 2: React frontend (port 5173)

Access Points:

  • 🌐 Frontend: <CloudFront-URL>/ports/5173/
  • πŸ”Œ API Docs: <CloudFront-URL>/ports/8000/docs
  • πŸ“Š Health: <CloudFront-URL>/ports/8000/api/health

πŸ“ Repository Structure

β”œβ”€β”€ notebooks/                      # Workshop Notebooks (Parts 1-4)
β”‚   β”œβ”€β”€ Part_1_Semantic_Search_Foundations_Exercises.ipynb
β”‚   β”œβ”€β”€ Part_1_Semantic_Search_Foundations_Solutions.ipynb
β”‚   β”œβ”€β”€ Part_2_Context_Management_Custom_Tools_Exercises.ipynb
β”‚   β”œβ”€β”€ Part_2_Context_Management_Custom_Tools_Solutions.ipynb
β”‚   β”œβ”€β”€ Part_3_Multi_Agent_Orchestration_Exercises.ipynb
β”‚   β”œβ”€β”€ Part_3_Multi_Agent_Orchestration_Solutions.ipynb
β”‚   β”œβ”€β”€ Part_4_Advanced_Topics_Production_Patterns.ipynb
β”‚   └── requirements.txt
β”œβ”€β”€ blaize-bazaar/                  # Full-Stack Demo Application
β”‚   β”œβ”€β”€ backend/                    # FastAPI + Multi-Agent System
β”‚   β”‚   β”œβ”€β”€ agents/                # Orchestrator, Inventory, Pricing, Recommendation
β”‚   β”‚   β”œβ”€β”€ services/              # Search, MCP, Bedrock integration
β”‚   β”‚   β”œβ”€β”€ models/                # Pydantic data models
β”‚   β”‚   └── app.py                 # FastAPI application
β”‚   β”œβ”€β”€ frontend/                   # React + TypeScript UI
β”‚   β”‚   └── src/                   # Components, hooks, services
β”‚   β”œβ”€β”€ config/                     # MCP server configuration
β”‚   β”œβ”€β”€ start-backend.sh
β”‚   └── start-frontend.sh
β”œβ”€β”€ data/                           # Product catalog datasets
β”‚   └── amazon-products-sample.csv
└── scripts/                        # Setup & bootstrap scripts
    β”œβ”€β”€ bootstrap-environment.sh
    β”œβ”€β”€ bootstrap-labs.sh
    └── load-database-fast.sh

🎯 Workshop Structure

Part 1: Semantic Search Foundations (25 min) - Hands-on Exercises

Building semantic search with pgvector 0.8.0 and Aurora PostgreSQL

  • Vector embeddings with Amazon Titan Text Embeddings v2 (1024 dimensions)
  • HNSW indexing for production-scale similarity search
  • Enterprise-tuned indexes (M=16, ef_construction=64)
  • Automatic iterative scanning for guaranteed recall
  • Session state management with Aurora PostgreSQL

Part 2: Context Management & Custom Agent Tools (20 min) - Interactive Guided Demo

Building custom tools for Aurora PostgreSQL data access with MCP

  • Custom tool creation with @tool decorator patterns
  • Trending products, inventory analytics, pricing insights
  • Intelligent token counting and context optimization
  • Model Context Protocol integration with Strands SDK

Part 3: Multi-Agent Orchestration (25 min) - Hands-on Exercises

Agents as Tools pattern with Strands SDK

  • Orchestrator + specialist agents (Inventory, Pricing, Recommendation)
  • Claude Sonnet 4 for intelligent query routing and agent coordination
  • Agent routing, coordination, and tool selection
  • OpenTelemetry distributed tracing

Part 4: Advanced Topics & Enterprise Patterns (Optional) - Self-paced

Production deployment patterns and optimization

  • Session management at enterprise scale
  • Vector quantization strategies (binary, scalar)
  • Resilience patterns and error handling
  • Cost optimization and performance tuning

πŸ’‘ Key Technical Insights

Why pgvector 0.8.0?

Automatic Iterative Scanning eliminates manual tuning and guarantees complete results:

Before (pgvector 0.7.x):

SET hnsw.ef_search = 40;  -- Manual tuning required for each query
-- Risk: May miss relevant results with strict filters
-- Challenge: Different ef_search values needed per use case

After (pgvector 0.8.0):

SET hnsw.iterative_scan = 'relaxed_order';
-- Automatically finds all matching results with minimal latency
-- Guarantees 100% recall across all queries regardless of filters
-- No manual tuning needed for production deployment

Why Agents as Tools Pattern?

Traditional Monolithic Approach Agents as Tools Pattern
Single agent handles all tasks Orchestrator + specialized agents
All capabilities in one codebase Focused expertise per agent domain
Hard to maintain and debug Independent testing and updates
Sequential execution only Parallel execution possible
Difficult to scale Horizontal scaling per agent type

Benefits:

  • 🎯 Domain expertise - Each agent masters specific capabilities
  • πŸ”„ Easy maintenance - Update agents independently
  • ⚑ Better performance - Optimized per agent type
  • πŸ“ˆ Scalable architecture - Add new agents without refactoring
  • πŸ§ͺ Testability - Unit test agents in isolation

πŸ›οΈ Blaize Bazaar Demo Application

Full-stack e-commerce platform demonstrating enterprise-grade agentic AI

Quick Start Guide

Step 1: Split terminal into two panes (side-by-side)

Step 2: Navigate to blaize-bazaar directory in both panes

blaize-bazaar

Step 3: Start backend (Left Pane)

start-backend
# FastAPI server starts on port 8000
# Wait for "Application startup complete" message

Step 4: Start frontend (Right Pane)

start-frontend
# React dev server starts on port 5173
# Opens automatically in browser

Architecture Flow

React Frontend (TypeScript + Tailwind CSS)
              ↓
    FastAPI Backend (Python 3.13)
         ↓           ↓
   Orchestrator β†’ Specialist Agents
         ↓           ↓           ↓
   Inventory     Pricing    Recommendation
         β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                    ↓
      Aurora PostgreSQL + pgvector

Platform Features

Features

  • ✨ Semantic Search: Vector similarity with pgvector 0.8.0 HNSW indexes for natural language queries
  • πŸ’¬ Conversational AI: Claude Sonnet 4 for intelligent query understanding and agent routing
  • πŸ”§ MCP Context Manager: Custom tools for Aurora PostgreSQL data access
  • πŸ€– Multi-Agent System: Orchestrator + 3 specialist agents (Agents as Tools)
  • πŸ” Smart Filters: Category, price, rating with real-time filtering
  • ⚑ Real-time: Autocomplete and quick search results
  • πŸ“Š Agent Traces: OpenTelemetry observability for multi-agent workflows
  • 🎯 Enterprise-Ready: Cost analysis, security patterns, and monitoring

πŸ—„οΈ Database Schema

Table: bedrock_integration.product_catalog

Column Type Index Description
productId CHAR(10) PRIMARY KEY Unique product identifier
product_description VARCHAR(500) GIN Full product details for text search
imgUrl VARCHAR(70) β€” Product image URL
productURL VARCHAR(40) β€” Product page URL
stars NUMERIC(2,1) Partial Rating (1.0-5.0)
reviews INTEGER β€” Customer review count
price NUMERIC(8,2) Partial Price in USD
category_id SMALLINT β€” Category identifier
isBestSeller BOOLEAN Partial Bestseller flag
boughtInLastMonth INTEGER β€” Recent purchase count
category_name VARCHAR(50) B-tree Product category
quantity SMALLINT β€” Available stock (0-1000)
embedding VECTOR(1024) HNSW Titan v2 semantic vector embedding

Performance-Optimized Indexes

-- Vector similarity search (HNSW optimized for 21,704 products)
CREATE INDEX idx_product_embedding_hnsw 
ON product_catalog USING hnsw (embedding vector_cosine_ops)
WITH (m = 16, ef_construction = 128);

-- Full-text search (GIN for keyword matching)
CREATE INDEX idx_product_fts 
ON product_catalog USING GIN (to_tsvector('english', product_description));

-- Category and price filters
CREATE INDEX idx_product_category_name ON product_catalog(category_name);
CREATE INDEX idx_product_price ON product_catalog(price) WHERE price > 0;

-- Partial indexes for common filters
CREATE INDEX idx_product_stars ON product_catalog(stars) WHERE stars >= 4.0;
CREATE INDEX idx_product_bestseller ON product_catalog("isBestSeller") WHERE "isBestSeller" = TRUE;

-- Composite index for category + price queries
CREATE INDEX idx_product_category_price 
ON product_catalog(category_name, price) WHERE price > 0 AND quantity > 0;

πŸ”Œ API Reference

Search Endpoint

POST /api/search
Content-Type: application/json

{
  "query": "wireless gaming headphones noise cancellation",
  "limit": 10,
  "min_similarity": 0.3,
  "filters": {
    "category": "Electronics",
    "min_price": 50,
    "max_price": 200,
    "min_stars": 4.0
  }
}

Response:

{
  "results": [
    {
      "productId": "B08XYZ",
      "product_description": "Premium wireless gaming headset...",
      "price": 149.99,
      "stars": 4.5,
      "reviews": 1243,
      "similarity": 0.87
    }
  ],
  "total": 10,
  "query_time_ms": 45
}

πŸ”§ Model Context Protocol (MCP)

Custom tools built with Strands SDK for Aurora PostgreSQL agent integration, enabling intelligent database access and business logic execution.

Custom Tools Implemented:

  • get_trending_products - Top products by popularity metrics
  • check_inventory - Real-time stock availability queries
  • analyze_pricing - Price trend analysis and insights
  • get_recommendations - Semantic similarity-based suggestions

Architecture Benefits:

  • πŸ”Œ Standardized tool interface via MCP specification
  • πŸ”„ Reusable across multiple agents
  • πŸ“Š Built-in token counting and context management
  • ⚑ Direct database access with connection pooling

πŸ€– Multi-Agent Architecture

πŸ”§ Framework Agnostic Concepts: While this workshop uses Strands SDK for hands-on implementation, the multi-agent patterns and architectural concepts (Agents as Tools, orchestration, specialist agents) apply equally to other frameworks like LangGraph, LangChain, CrewAI, AutoGen, and more. Focus on understanding the patterns - the implementation details are transferable.

Orchestrator Agent (Claude Sonnet 4)

Capabilities:

  • 🧠 Intelligent query routing and agent coordination (supports extended thinking with interleaved mode for complex multi-step analysis)
  • πŸ”„ Adaptive task routing based on tool responses and context
  • πŸ“Š Context-aware agent selection and coordination
  • 🎯 Dynamic workflow orchestration

Specialized Agents (Agents as Tools Pattern)

1. Inventory Agent

βœ“ Real-time stock monitoring across catalog
βœ“ Low inventory alerts (threshold: <10 units)
βœ“ Restocking recommendations with priority levels
βœ“ Stock availability forecasting

2. Recommendation Agent

βœ“ Personalized product suggestions via semantic search
βœ“ Feature-based matching and similarity analysis
βœ“ Budget-conscious alternatives with price awareness
βœ“ Cross-category recommendations

3. Pricing Agent

βœ“ Price trend analysis and historical patterns
βœ“ Deal identification (discount threshold: >20% off)
βœ“ Value-for-money rankings and comparisons
βœ“ Competitive pricing insights

πŸ’° Cost Analysis

Workshop Environment Costs

Service Usage Estimated Cost
Amazon Bedrock
Titan Text Embeddings v2 ~10K tokens (initial load) $0.10
Claude Sonnet 4 ~50K tokens (agent queries) $1.50
Aurora PostgreSQL
Storage (10K vectors) 100 MB $0.00*
I/O Operations ~1K reads $0.00*

*Included in pre-provisioned workshop environment

Production Estimates (1M queries/month)

Component Monthly Cost Range Notes
Aurora PostgreSQL $150-600 Depends on instance family, size, and I/O configuration
Bedrock Embeddings $100 100M tokens @ $0.001/1K tokens
Bedrock Claude Sonnet 4 $300 100M tokens @ $0.003/1K tokens
Data Transfer $50 500 GB outbound from AWS
Total $600-1,050 Varies based on Aurora configuration

Aurora Configuration Best Practices

For Read-Heavy Workloads (Recommended):

  • Aurora I/O-Optimized - Zero I/O charges, predictable monthly costs
  • Optimized Reads (NVMe-SSD) - Faster query performance with local caching
  • Read Replicas - Distribute read load across multiple instances (up to 15)

Cost Optimization Benefits:

  • I/O-Optimized eliminates per-request I/O charges (typical savings: 20-40%)
  • Optimized Reads reduce network I/O by caching frequently accessed data locally
  • Combined approach ideal for vector search workloads with high read volume

Scaling Guidance:

  • Start with smaller instances and scale based on actual metrics
  • Monitor ReadLatency, CPUUtilization, and DatabaseConnections
  • Use Aurora Serverless v2 for variable or unpredictable workloads
  • Consider Aurora Global Database for multi-region deployments

Cost Optimization Strategies

  • Cache embeddings - Reduce Bedrock calls by 80% with semantic caching
  • Aurora Serverless v2 - Auto-scaling for variable workloads (0.5-16 ACU)
  • Query result caching - Redis/ElastiCache for frequently accessed data
  • Batch processing - Generate embeddings during off-peak hours
  • Read replicas - Distribute query load across multiple Aurora instances

πŸ”’ Security Best Practices

Database Security

βœ“ Enable encryption at rest (AES-256 for all data)
βœ“ Use IAM database authentication (no password rotation needed)
βœ“ Restrict security groups to application subnets only
βœ“ Enable automated backups (7-35 day retention period)
βœ“ Use AWS Secrets Manager for credential management
βœ“ Enable VPC endpoints for private connectivity

Application Security

βœ“ Input validation on all user queries and API endpoints
βœ“ SQL injection prevention (parameterized queries only)
βœ“ Rate limiting per user/IP (default: 100 requests/minute)
βœ“ API authentication (JWT tokens with expiration)
βœ“ CORS configuration for production domains
βœ“ Content Security Policy (CSP) headers

AI/ML Security

βœ“ Bedrock Guardrails for content filtering and safety
βœ“ PII detection and redaction in user queries
βœ“ Audit logging for all AI interactions (CloudTrail)
βœ“ Model access controls via IAM policies
βœ“ Prompt injection prevention and validation
βœ“ Token usage monitoring and anomaly detection

πŸ“Š Observability & Monitoring

OpenTelemetry Integration

Built-in distributed tracing for multi-agent workflows:

# Automatic trace capture with context propagation
✨ Agent: Orchestrator
   Duration: 245ms
   Tokens: 215 (input: 150, output: 65)
   Status: Success
   
πŸ€– LLM Call: claude-sonnet-4
   Duration: 180ms
   Model: anthropic.claude-sonnet-4-20250514-v1:0
   Temperature: 0.7
   
πŸ”§ Tool: get_trending_products
   Duration: 45ms
   Result: 10 products
   Query: SELECT * FROM product_catalog...

CloudWatch Metrics

Database Metrics:

  • DatabaseConnections - Active connection count
  • ReadLatency / WriteLatency - Query performance (milliseconds)
  • CPUUtilization - Compute resource usage (%)
  • FreeableMemory - Available RAM for caching (GB)
  • VolumeReadIOPs / VolumeWriteIOPs - Disk operations

Application Metrics:

  • SearchLatency - End-to-end query processing time
  • AgentInvocations - Agent usage patterns and frequency
  • BedrockTokens - Token consumption and costs
  • ErrorRate - Failed requests and exceptions
  • CacheHitRate - Embedding cache effectiveness

Custom Dashboards:

# Key Performance Indicators (KPIs)
- P50/P95/P99 search latency percentiles
- Agent routing accuracy and success rate
- Cache hit rate and memory efficiency
- Cost per query and daily spend tracking

Alerting Strategy

Alert Threshold Action
High Latency P95 > 2s Scale Aurora read replicas
Error Rate > 5% Page on-call engineer immediately
Token Spike > 2x baseline Investigate potential abuse or bugs
DB Connections > 80% max Check for connection leaks
Cost Anomaly > 150% daily budget Review usage patterns

Structured Logging

# Context-rich structured logging for debugging
logger.info(
    "search_query_executed",
    query=query,
    user_id=user_id,
    latency_ms=latency,
    results_count=len(results),
    trace_id=trace_id,
    similarity_threshold=min_similarity,
    filters=filters
)

πŸ› οΈ Technology Stack

Layer Technologies
Database Aurora PostgreSQL 17.5 β€’ pgvector 0.8.0 (HNSW)
AI/ML Amazon Bedrock (Titan Text Embeddings v2, Claude Sonnet 4)
Backend FastAPI β€’ Python 3.13 β€’ psycopg3 β€’ boto3 β€’ Pydantic v2
Frontend React 18 β€’ TypeScript 5 β€’ Tailwind CSS β€’ Vite β€’ Lucide Icons
Search HNSW vector indexes β€’ Trigram text indexes β€’ Cosine similarity
Agent Framework Strands SDK β€’ Agents as Tools pattern β€’ MCP integration
Observability OpenTelemetry β€’ CloudWatch β€’ Structured logging

πŸš€ Production Deployment Guide

Horizontal Scaling Strategy

Database Layer:

  • Aurora read replicas for search queries (up to 15 replicas)
  • Multi-AZ deployment for high availability
  • Cross-region read replicas for global applications

Application Layer:

  • Application Load Balancer (ALB) for FastAPI instances
  • Auto Scaling Groups (ASG) based on CPU/memory
  • CloudFront CDN for React frontend static assets

Horizontal Scaling Strategy

Database Layer:

  • Aurora read replicas for search queries (up to 15 replicas)
  • Multi-AZ deployment for high availability
  • Cross-region read replicas for global applications

Application Layer:

  • Application Load Balancer (ALB) for FastAPI instances
  • Auto Scaling Groups (ASG) based on CPU/memory metrics
  • CloudFront CDN for React frontend static assets

Vertical Scaling Approach

General Guidance:

  • Start with smaller instance sizes and scale based on actual performance metrics
  • Monitor key metrics: ReadLatency, CPUUtilization, DatabaseConnections, FreeableMemory
  • Scale vertically when consistently hitting >70% CPU or memory utilization
  • Consider Aurora Serverless v2 for workloads with variable or unpredictable patterns

Performance Indicators:

  • ReadLatency consistently >50ms β†’ Consider larger instance or read replicas
  • CPUUtilization sustained >70% β†’ Scale to larger instance size
  • DatabaseConnections approaching max β†’ Review connection pooling or scale up
  • FreeableMemory <20% of total β†’ Increase instance size for better caching

Aurora Serverless v2 Configuration

# Auto-scaling configuration for variable workloads
MinCapacity: 0.5 ACU (1 GB RAM)
MaxCapacity: 16 ACU (32 GB RAM)
AutoPause: true (after 5 minutes of inactivity)
ScaleIncrement: 0.5 ACU per scaling step

Benefits:

  • Pay only for resources used (per-second billing)
  • Automatic scaling based on workload
  • Zero infrastructure management overhead

πŸ“š Resources & References

AWS Documentation

Open Source & Standards

Related AWS re:Invent 2025 Workshops

  • DAT409: Implement hybrid search with Aurora PostgreSQL for MCP retrieval [REPEAT]
  • DAT428: Build a cost-effective RAG-based gen AI application with Amazon Aurora [REPEAT]
  • DAT403: Build a multi-agent AI solution with Amazon Aurora & Bedrock AgentCore

Research Papers & Technical References


⭐ Community & Support

Like This Workshop?

If you find this helpful:

  • ⭐ Star this repository to show support and help others discover it
  • πŸ”± Fork it to customize for your specific use cases
  • πŸ› Report issues to help improve the workshop
  • πŸ“’ Share it with your community and colleagues
  • πŸ’¬ Contribute - Pull requests welcome for improvements

Getting Help


πŸ“„ License

This library is licensed under the MIT-0 License. See the LICENSE file for details.


πŸ™ Acknowledgments

Workshop Developed and Tested By:

  • Shayon Sanyal - Principal Solutions Architect, AWS | Email: [email protected]
  • AWS Database Specialists - Workshop support team

Special Thanks:

  • pgvector community for the amazing open-source extension
  • Anthropic for Claude Sonnet 4 capabilities
  • AWS Workshop Studio team for platform support

Β© 2025 Amazon Web Services | AWS re:Invent 2025 | DAT406 Workshop

GitHub AWS LinkedIn

⭐ If you found this workshop helpful, please star this repository! ⭐

Built with ❀️ for the AWS community

About

DAT406 - Build Agentic AI-Powered Search with Amazon Aurora and Amazon RDS

Resources

License

Code of conduct

Contributing

Security policy

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •