β οΈ Educational Workshop: This repository contains demonstration code for AWS re:Invent 2025. Not intended for production deployment without proper security hardening and testing.
Workshop Duration: 2 hours | Hands-on: Parts 1 & 3 (50 min) | Guided Demo: Part 2 (20 min) | Optional: Part 4 (Self-paced)
Build enterprise-grade agentic AI applications with semantic search, multi-agent orchestration, and Model Context Protocol integration. Leverage Amazon Aurora PostgreSQL 17.5 with pgvector 0.8.0, Amazon Bedrock (Claude Sonnet 4 + Titan Text Embeddings v2), and modern full-stack technologies.
start-backend # Terminal 1: FastAPI backend (port 8000)
start-frontend # Terminal 2: React frontend (port 5173)Access Points:
- π Frontend:
<CloudFront-URL>/ports/5173/ - π API Docs:
<CloudFront-URL>/ports/8000/docs - π Health:
<CloudFront-URL>/ports/8000/api/health
βββ notebooks/ # Workshop Notebooks (Parts 1-4)
β βββ Part_1_Semantic_Search_Foundations_Exercises.ipynb
β βββ Part_1_Semantic_Search_Foundations_Solutions.ipynb
β βββ Part_2_Context_Management_Custom_Tools_Exercises.ipynb
β βββ Part_2_Context_Management_Custom_Tools_Solutions.ipynb
β βββ Part_3_Multi_Agent_Orchestration_Exercises.ipynb
β βββ Part_3_Multi_Agent_Orchestration_Solutions.ipynb
β βββ Part_4_Advanced_Topics_Production_Patterns.ipynb
β βββ requirements.txt
βββ blaize-bazaar/ # Full-Stack Demo Application
β βββ backend/ # FastAPI + Multi-Agent System
β β βββ agents/ # Orchestrator, Inventory, Pricing, Recommendation
β β βββ services/ # Search, MCP, Bedrock integration
β β βββ models/ # Pydantic data models
β β βββ app.py # FastAPI application
β βββ frontend/ # React + TypeScript UI
β β βββ src/ # Components, hooks, services
β βββ config/ # MCP server configuration
β βββ start-backend.sh
β βββ start-frontend.sh
βββ data/ # Product catalog datasets
β βββ amazon-products-sample.csv
βββ scripts/ # Setup & bootstrap scripts
βββ bootstrap-environment.sh
βββ bootstrap-labs.sh
βββ load-database-fast.sh
Building semantic search with pgvector 0.8.0 and Aurora PostgreSQL
- Vector embeddings with Amazon Titan Text Embeddings v2 (1024 dimensions)
- HNSW indexing for production-scale similarity search
- Enterprise-tuned indexes (M=16, ef_construction=64)
- Automatic iterative scanning for guaranteed recall
- Session state management with Aurora PostgreSQL
Building custom tools for Aurora PostgreSQL data access with MCP
- Custom tool creation with
@tooldecorator patterns - Trending products, inventory analytics, pricing insights
- Intelligent token counting and context optimization
- Model Context Protocol integration with Strands SDK
Agents as Tools pattern with Strands SDK
- Orchestrator + specialist agents (Inventory, Pricing, Recommendation)
- Claude Sonnet 4 for intelligent query routing and agent coordination
- Agent routing, coordination, and tool selection
- OpenTelemetry distributed tracing
Production deployment patterns and optimization
- Session management at enterprise scale
- Vector quantization strategies (binary, scalar)
- Resilience patterns and error handling
- Cost optimization and performance tuning
Automatic Iterative Scanning eliminates manual tuning and guarantees complete results:
Before (pgvector 0.7.x):
SET hnsw.ef_search = 40; -- Manual tuning required for each query
-- Risk: May miss relevant results with strict filters
-- Challenge: Different ef_search values needed per use caseAfter (pgvector 0.8.0):
SET hnsw.iterative_scan = 'relaxed_order';
-- Automatically finds all matching results with minimal latency
-- Guarantees 100% recall across all queries regardless of filters
-- No manual tuning needed for production deployment| Traditional Monolithic Approach | Agents as Tools Pattern |
|---|---|
| Single agent handles all tasks | Orchestrator + specialized agents |
| All capabilities in one codebase | Focused expertise per agent domain |
| Hard to maintain and debug | Independent testing and updates |
| Sequential execution only | Parallel execution possible |
| Difficult to scale | Horizontal scaling per agent type |
Benefits:
- π― Domain expertise - Each agent masters specific capabilities
- π Easy maintenance - Update agents independently
- β‘ Better performance - Optimized per agent type
- π Scalable architecture - Add new agents without refactoring
- π§ͺ Testability - Unit test agents in isolation
Full-stack e-commerce platform demonstrating enterprise-grade agentic AI
Step 1: Split terminal into two panes (side-by-side)
Step 2: Navigate to blaize-bazaar directory in both panes
blaize-bazaarStep 3: Start backend (Left Pane)
start-backend
# FastAPI server starts on port 8000
# Wait for "Application startup complete" messageStep 4: Start frontend (Right Pane)
start-frontend
# React dev server starts on port 5173
# Opens automatically in browserReact Frontend (TypeScript + Tailwind CSS)
β
FastAPI Backend (Python 3.13)
β β
Orchestrator β Specialist Agents
β β β
Inventory Pricing Recommendation
ββββββββββββββ΄βββββββββββββ
β
Aurora PostgreSQL + pgvector
- β¨ Semantic Search: Vector similarity with pgvector 0.8.0 HNSW indexes for natural language queries
- π¬ Conversational AI: Claude Sonnet 4 for intelligent query understanding and agent routing
- π§ MCP Context Manager: Custom tools for Aurora PostgreSQL data access
- π€ Multi-Agent System: Orchestrator + 3 specialist agents (Agents as Tools)
- π Smart Filters: Category, price, rating with real-time filtering
- β‘ Real-time: Autocomplete and quick search results
- π Agent Traces: OpenTelemetry observability for multi-agent workflows
- π― Enterprise-Ready: Cost analysis, security patterns, and monitoring
Table: bedrock_integration.product_catalog
| Column | Type | Index | Description |
|---|---|---|---|
productId |
CHAR(10) | PRIMARY KEY | Unique product identifier |
product_description |
VARCHAR(500) | GIN | Full product details for text search |
imgUrl |
VARCHAR(70) | β | Product image URL |
productURL |
VARCHAR(40) | β | Product page URL |
stars |
NUMERIC(2,1) | Partial | Rating (1.0-5.0) |
reviews |
INTEGER | β | Customer review count |
price |
NUMERIC(8,2) | Partial | Price in USD |
category_id |
SMALLINT | β | Category identifier |
isBestSeller |
BOOLEAN | Partial | Bestseller flag |
boughtInLastMonth |
INTEGER | β | Recent purchase count |
category_name |
VARCHAR(50) | B-tree | Product category |
quantity |
SMALLINT | β | Available stock (0-1000) |
embedding |
VECTOR(1024) | HNSW | Titan v2 semantic vector embedding |
-- Vector similarity search (HNSW optimized for 21,704 products)
CREATE INDEX idx_product_embedding_hnsw
ON product_catalog USING hnsw (embedding vector_cosine_ops)
WITH (m = 16, ef_construction = 128);
-- Full-text search (GIN for keyword matching)
CREATE INDEX idx_product_fts
ON product_catalog USING GIN (to_tsvector('english', product_description));
-- Category and price filters
CREATE INDEX idx_product_category_name ON product_catalog(category_name);
CREATE INDEX idx_product_price ON product_catalog(price) WHERE price > 0;
-- Partial indexes for common filters
CREATE INDEX idx_product_stars ON product_catalog(stars) WHERE stars >= 4.0;
CREATE INDEX idx_product_bestseller ON product_catalog("isBestSeller") WHERE "isBestSeller" = TRUE;
-- Composite index for category + price queries
CREATE INDEX idx_product_category_price
ON product_catalog(category_name, price) WHERE price > 0 AND quantity > 0;POST /api/search
Content-Type: application/json
{
"query": "wireless gaming headphones noise cancellation",
"limit": 10,
"min_similarity": 0.3,
"filters": {
"category": "Electronics",
"min_price": 50,
"max_price": 200,
"min_stars": 4.0
}
}Response:
{
"results": [
{
"productId": "B08XYZ",
"product_description": "Premium wireless gaming headset...",
"price": 149.99,
"stars": 4.5,
"reviews": 1243,
"similarity": 0.87
}
],
"total": 10,
"query_time_ms": 45
}Custom tools built with Strands SDK for Aurora PostgreSQL agent integration, enabling intelligent database access and business logic execution.
Custom Tools Implemented:
get_trending_products- Top products by popularity metricscheck_inventory- Real-time stock availability queriesanalyze_pricing- Price trend analysis and insightsget_recommendations- Semantic similarity-based suggestions
Architecture Benefits:
- π Standardized tool interface via MCP specification
- π Reusable across multiple agents
- π Built-in token counting and context management
- β‘ Direct database access with connection pooling
π§ Framework Agnostic Concepts: While this workshop uses Strands SDK for hands-on implementation, the multi-agent patterns and architectural concepts (Agents as Tools, orchestration, specialist agents) apply equally to other frameworks like LangGraph, LangChain, CrewAI, AutoGen, and more. Focus on understanding the patterns - the implementation details are transferable.
Capabilities:
- π§ Intelligent query routing and agent coordination (supports extended thinking with interleaved mode for complex multi-step analysis)
- π Adaptive task routing based on tool responses and context
- π Context-aware agent selection and coordination
- π― Dynamic workflow orchestration
1. Inventory Agent
β Real-time stock monitoring across catalog
β Low inventory alerts (threshold: <10 units)
β Restocking recommendations with priority levels
β Stock availability forecasting2. Recommendation Agent
β Personalized product suggestions via semantic search
β Feature-based matching and similarity analysis
β Budget-conscious alternatives with price awareness
β Cross-category recommendations3. Pricing Agent
β Price trend analysis and historical patterns
β Deal identification (discount threshold: >20% off)
β Value-for-money rankings and comparisons
β Competitive pricing insights| Service | Usage | Estimated Cost |
|---|---|---|
| Amazon Bedrock | ||
| Titan Text Embeddings v2 | ~10K tokens (initial load) | $0.10 |
| Claude Sonnet 4 | ~50K tokens (agent queries) | $1.50 |
| Aurora PostgreSQL | ||
| Storage (10K vectors) | 100 MB | $0.00* |
| I/O Operations | ~1K reads | $0.00* |
*Included in pre-provisioned workshop environment
| Component | Monthly Cost Range | Notes |
|---|---|---|
| Aurora PostgreSQL | $150-600 | Depends on instance family, size, and I/O configuration |
| Bedrock Embeddings | $100 | 100M tokens @ $0.001/1K tokens |
| Bedrock Claude Sonnet 4 | $300 | 100M tokens @ $0.003/1K tokens |
| Data Transfer | $50 | 500 GB outbound from AWS |
| Total | $600-1,050 | Varies based on Aurora configuration |
For Read-Heavy Workloads (Recommended):
- Aurora I/O-Optimized - Zero I/O charges, predictable monthly costs
- Optimized Reads (NVMe-SSD) - Faster query performance with local caching
- Read Replicas - Distribute read load across multiple instances (up to 15)
Cost Optimization Benefits:
- I/O-Optimized eliminates per-request I/O charges (typical savings: 20-40%)
- Optimized Reads reduce network I/O by caching frequently accessed data locally
- Combined approach ideal for vector search workloads with high read volume
Scaling Guidance:
- Start with smaller instances and scale based on actual metrics
- Monitor
ReadLatency,CPUUtilization, andDatabaseConnections - Use Aurora Serverless v2 for variable or unpredictable workloads
- Consider Aurora Global Database for multi-region deployments
- Cache embeddings - Reduce Bedrock calls by 80% with semantic caching
- Aurora Serverless v2 - Auto-scaling for variable workloads (0.5-16 ACU)
- Query result caching - Redis/ElastiCache for frequently accessed data
- Batch processing - Generate embeddings during off-peak hours
- Read replicas - Distribute query load across multiple Aurora instances
β Enable encryption at rest (AES-256 for all data)
β Use IAM database authentication (no password rotation needed)
β Restrict security groups to application subnets only
β Enable automated backups (7-35 day retention period)
β Use AWS Secrets Manager for credential management
β Enable VPC endpoints for private connectivityβ Input validation on all user queries and API endpoints
β SQL injection prevention (parameterized queries only)
β Rate limiting per user/IP (default: 100 requests/minute)
β API authentication (JWT tokens with expiration)
β CORS configuration for production domains
β Content Security Policy (CSP) headersβ Bedrock Guardrails for content filtering and safety
β PII detection and redaction in user queries
β Audit logging for all AI interactions (CloudTrail)
β Model access controls via IAM policies
β Prompt injection prevention and validation
β Token usage monitoring and anomaly detectionBuilt-in distributed tracing for multi-agent workflows:
# Automatic trace capture with context propagation
β¨ Agent: Orchestrator
Duration: 245ms
Tokens: 215 (input: 150, output: 65)
Status: Success
π€ LLM Call: claude-sonnet-4
Duration: 180ms
Model: anthropic.claude-sonnet-4-20250514-v1:0
Temperature: 0.7
π§ Tool: get_trending_products
Duration: 45ms
Result: 10 products
Query: SELECT * FROM product_catalog...Database Metrics:
DatabaseConnections- Active connection countReadLatency/WriteLatency- Query performance (milliseconds)CPUUtilization- Compute resource usage (%)FreeableMemory- Available RAM for caching (GB)VolumeReadIOPs/VolumeWriteIOPs- Disk operations
Application Metrics:
SearchLatency- End-to-end query processing timeAgentInvocations- Agent usage patterns and frequencyBedrockTokens- Token consumption and costsErrorRate- Failed requests and exceptionsCacheHitRate- Embedding cache effectiveness
Custom Dashboards:
# Key Performance Indicators (KPIs)
- P50/P95/P99 search latency percentiles
- Agent routing accuracy and success rate
- Cache hit rate and memory efficiency
- Cost per query and daily spend tracking| Alert | Threshold | Action |
|---|---|---|
| High Latency | P95 > 2s | Scale Aurora read replicas |
| Error Rate | > 5% | Page on-call engineer immediately |
| Token Spike | > 2x baseline | Investigate potential abuse or bugs |
| DB Connections | > 80% max | Check for connection leaks |
| Cost Anomaly | > 150% daily budget | Review usage patterns |
# Context-rich structured logging for debugging
logger.info(
"search_query_executed",
query=query,
user_id=user_id,
latency_ms=latency,
results_count=len(results),
trace_id=trace_id,
similarity_threshold=min_similarity,
filters=filters
)| Layer | Technologies |
|---|---|
| Database | Aurora PostgreSQL 17.5 β’ pgvector 0.8.0 (HNSW) |
| AI/ML | Amazon Bedrock (Titan Text Embeddings v2, Claude Sonnet 4) |
| Backend | FastAPI β’ Python 3.13 β’ psycopg3 β’ boto3 β’ Pydantic v2 |
| Frontend | React 18 β’ TypeScript 5 β’ Tailwind CSS β’ Vite β’ Lucide Icons |
| Search | HNSW vector indexes β’ Trigram text indexes β’ Cosine similarity |
| Agent Framework | Strands SDK β’ Agents as Tools pattern β’ MCP integration |
| Observability | OpenTelemetry β’ CloudWatch β’ Structured logging |
Database Layer:
- Aurora read replicas for search queries (up to 15 replicas)
- Multi-AZ deployment for high availability
- Cross-region read replicas for global applications
Application Layer:
- Application Load Balancer (ALB) for FastAPI instances
- Auto Scaling Groups (ASG) based on CPU/memory
- CloudFront CDN for React frontend static assets
Database Layer:
- Aurora read replicas for search queries (up to 15 replicas)
- Multi-AZ deployment for high availability
- Cross-region read replicas for global applications
Application Layer:
- Application Load Balancer (ALB) for FastAPI instances
- Auto Scaling Groups (ASG) based on CPU/memory metrics
- CloudFront CDN for React frontend static assets
General Guidance:
- Start with smaller instance sizes and scale based on actual performance metrics
- Monitor key metrics:
ReadLatency,CPUUtilization,DatabaseConnections,FreeableMemory - Scale vertically when consistently hitting >70% CPU or memory utilization
- Consider Aurora Serverless v2 for workloads with variable or unpredictable patterns
Performance Indicators:
- ReadLatency consistently >50ms β Consider larger instance or read replicas
- CPUUtilization sustained >70% β Scale to larger instance size
- DatabaseConnections approaching max β Review connection pooling or scale up
- FreeableMemory <20% of total β Increase instance size for better caching
# Auto-scaling configuration for variable workloads
MinCapacity: 0.5 ACU (1 GB RAM)
MaxCapacity: 16 ACU (32 GB RAM)
AutoPause: true (after 5 minutes of inactivity)
ScaleIncrement: 0.5 ACU per scaling stepBenefits:
- Pay only for resources used (per-second billing)
- Automatic scaling based on workload
- Zero infrastructure management overhead
- Aurora PostgreSQL User Guide - Complete reference for Aurora configuration
- Amazon Bedrock Documentation - Foundation models and API reference
- pgvector 0.8.0 Performance Blog - Deep dive into 0.8.0 features
- pgvector GitHub - Open-source vector similarity search extension
- Model Context Protocol (MCP) - Protocol specification and documentation
- AWS Labs MCP Servers - AWS-maintained MCP server implementations
- Strands SDK Documentation - Agent framework and patterns
- DAT409: Implement hybrid search with Aurora PostgreSQL for MCP retrieval [REPEAT]
- DAT428: Build a cost-effective RAG-based gen AI application with Amazon Aurora [REPEAT]
- DAT403: Build a multi-agent AI solution with Amazon Aurora & Bedrock AgentCore
- HNSW Algorithm Paper - Efficient and robust approximate nearest neighbor search
- Agents as Tools Pattern - Multi-agent architecture best practices
If you find this helpful:
- β Star this repository to show support and help others discover it
- π± Fork it to customize for your specific use cases
- π Report issues to help improve the workshop
- π’ Share it with your community and colleagues
- π¬ Contribute - Pull requests welcome for improvements
- Workshop Issues: GitHub Issues
- AWS Support: AWS Support Center
- Community: AWS Database Blog
This library is licensed under the MIT-0 License. See the LICENSE file for details.
Workshop Developed and Tested By:
- Shayon Sanyal - Principal Solutions Architect, AWS | Email: [email protected]
- AWS Database Specialists - Workshop support team
Special Thanks:
- pgvector community for the amazing open-source extension
- Anthropic for Claude Sonnet 4 capabilities
- AWS Workshop Studio team for platform support