Skip to content

A production-ready FastAPI template for building AI agent applications with LangGraph integration. This template provides a robust foundation for building scalable, secure, and maintainable AI agent services.

License

Notifications You must be signed in to change notification settings

amrmelsayed/fastapi-langgraph-agent-production-ready-template

Β 
Β 

Repository files navigation

FastAPI LangGraph Agent Template

A production-ready FastAPI template for building AI agent applications with LangGraph integration. This template provides a robust foundation for building scalable, secure, and maintainable AI agent services.

🌟 Features

  • Production-Ready Architecture

    • FastAPI for high-performance async API endpoints with uvloop optimization
    • LangGraph integration for AI agent workflows with state persistence
    • LangSmith for LLM observability and monitoring
    • Sentry for error tracking and performance monitoring
    • Structured logging with environment-specific formatting and request context
    • Rate limiting with configurable rules per endpoint
    • MongoDB Atlas for LangGraph checkpointing and mem0ai memory storage
    • Docker and Docker Compose support
    • Prometheus metrics and Grafana dashboards for monitoring
  • AI & LLM Features

    • Long-term memory with mem0ai and MongoDB for semantic memory storage
    • LLM Service with automatic retry logic using tenacity
    • Multiple LLM model support (GPT-4o, GPT-4o-mini, GPT-5, GPT-5-mini, GPT-5-nano)
    • Streaming responses for real-time chat interactions
    • Tool calling and function execution capabilities
  • Security

    • JWK (JSON Web Key) authentication with external auth service
    • Client-managed conversation sessions
    • Input sanitization
    • CORS configuration
    • Rate limiting protection
  • Developer Experience

    • Environment-specific configuration with automatic .env file loading
    • Comprehensive logging system with context binding
    • Clear project structure following best practices
    • Type hints throughout for better IDE support
    • Easy local development setup with Makefile commands
    • Automatic retry logic with exponential backoff for resilience

πŸš€ Quick Start

Prerequisites

  • Python 3.13+
  • MongoDB Atlas account (for LangGraph checkpointing and mem0ai)
  • External authentication service with JWKS endpoint
  • Docker and Docker Compose (optional)

Environment Setup

  1. Clone the repository:
git clone <repository-url>
cd <project-directory>
  1. Create and activate a virtual environment:
uv sync
  1. Copy the example environment file:
cp .env.example .env.[development|staging|production] # e.g. .env.development
  1. Update the .env file with your configuration (see .env.example for reference)

MongoDB Atlas Setup

  1. Create a MongoDB Atlas cluster at https://cloud.mongodb.com
  2. Get your connection string
  3. Update the MongoDB connection in your .env file:
# Note: Do not include the database name in the URI path - specify it separately
MONGODB_URI=mongodb+srv://<username>:<password>@<cluster>.mongodb.net/?retryWrites=true&w=majority
MONGODB_DB_NAME=langgraph_db

Authentication Setup

  1. Configure your external authentication service JWKS endpoint
  2. Update the authentication settings in your .env file:
AUTH_URL="https://your-auth-service.com"
JWT_ISSUER="https://your-auth-service.com"
JWT_AUDIENCE="your-audience"

Running the Application

Local Development

  1. Install dependencies:
uv sync
  1. Run the application:
make [dev|staging|prod] # e.g. make dev
  1. Go to Swagger UI:
http://localhost:8000/docs

Using Docker

  1. Build and run with Docker Compose:
make docker-build-env ENV=[development|staging|production] # e.g. make docker-build-env ENV=development
make docker-run-env ENV=[development|staging|production] # e.g. make docker-run-env ENV=development
  1. Access the monitoring stack:
# Prometheus metrics
http://localhost:9090

# Grafana dashboards
http://localhost:3000
Default credentials:
- Username: admin
- Password: admin

The Docker setup includes:

  • FastAPI application
  • Prometheus for metrics collection
  • Grafana for metrics visualization
  • Pre-configured dashboards for:
    • API performance metrics
    • Rate limiting statistics
    • LLM inference metrics
    • System resource usage

πŸ”§ Configuration

The application uses a flexible configuration system with environment-specific settings:

  • .env.development - Local development settings
  • .env.staging - Staging environment settings
  • .env.production - Production environment settings

Environment Variables

Key configuration variables include:

# Application
APP_ENV=development
PROJECT_NAME="FastAPI LangGraph Agent"
DEBUG=true

# MongoDB (for LangGraph checkpointing and mem0ai)
# Note: Do not include the database name in the URI path - specify it separately
MONGODB_URI=mongodb+srv://<username>:<password>@<cluster>.mongodb.net/?retryWrites=true&w=majority
MONGODB_DB_NAME=langgraph_db

# JWK Authentication
AUTH_URL="https://your-auth-service.com"
JWT_ISSUER="https://your-auth-service.com"
JWT_AUDIENCE="your-audience"

# LLM Configuration
OPENAI_API_KEY=your_openai_api_key
DEFAULT_LLM_MODEL=gpt-4o
DEFAULT_LLM_TEMPERATURE=0.7
MAX_TOKENS=4096

# Long-Term Memory
LONG_TERM_MEMORY_COLLECTION_NAME=agent_memories
LONG_TERM_MEMORY_MODEL=gpt-4o-mini
LONG_TERM_MEMORY_EMBEDDER_MODEL=text-embedding-3-small

# Observability (Optional - LangSmith)
LANGCHAIN_TRACING_V2=false  # Set to true to enable LangSmith tracing
LANGCHAIN_API_KEY=your_langsmith_api_key
LANGCHAIN_PROJECT=langgraph-fastapi-template
LANGCHAIN_ENDPOINT=https://api.smith.langchain.com

# Rate Limiting
RATE_LIMIT_ENABLED=true

🧠 Long-Term Memory

The application includes a sophisticated long-term memory system powered by mem0ai and MongoDB:

Features

  • Semantic Memory Storage: Stores and retrieves memories based on semantic similarity
  • User-Specific Memories: Each user has their own isolated memory space
  • Automatic Memory Management: Memories are automatically extracted, stored, and retrieved
  • Vector Search: Uses MongoDB Atlas for efficient similarity search
  • Configurable Models: Separate models for memory processing and embeddings

How It Works

  1. Memory Addition: During conversations, important information is automatically extracted and stored
  2. Memory Retrieval: Relevant memories are retrieved based on conversation context
  3. Memory Search: Semantic search finds related memories across conversations
  4. Memory Updates: Existing memories can be updated as new information becomes available

πŸ€– LLM Service

The LLM service provides robust, production-ready language model interactions with automatic retry logic and multiple model support.

Features

  • Multiple Model Support: Pre-configured support for GPT-4o, GPT-4o-mini, GPT-5, and GPT-5 variants
  • Automatic Retries: Uses tenacity for exponential backoff retry logic
  • Reasoning Configuration: GPT-5 models support configurable reasoning effort levels
  • Environment-Specific Tuning: Different parameters for development vs production
  • Fallback Mechanisms: Graceful degradation when primary models fail

Supported Models

Model Use Case Reasoning Effort
gpt-5 Complex reasoning tasks Medium
gpt-5-mini Balanced performance Low
gpt-5-nano Fast responses Minimal
gpt-4o Production workloads N/A
gpt-4o-mini Cost-effective tasks N/A

Retry Configuration

  • Automatically retries on API timeouts, rate limits, and temporary errors
  • Max Attempts: 3
  • Wait Strategy: Exponential backoff (1s, 2s, 4s)
  • Logging: All retry attempts are logged with context

πŸ“ Advanced Logging

The application uses structlog for structured, contextual logging with automatic request tracking.

Features

  • Structured Logging: All logs are structured with consistent fields
  • Request Context: Automatic binding of request_id, session_id, and user_id
  • Environment-Specific Formatting: JSON in production, colored console in development
  • Performance Tracking: Automatic logging of request duration and status
  • Exception Tracking: Full stack traces with context preservation

Logging Context Middleware

Every request automatically gets:

  • Unique request ID
  • User ID (from JWK token)
  • Conversation ID (from client)
  • Request path and method
  • Response status and duration

Log Format Standards

  • Event Names: lowercase_with_underscores
  • No F-Strings: Pass variables as kwargs for proper filtering
  • Context Binding: Always include relevant IDs and context
  • Appropriate Levels: debug, info, warning, error, exception

⚑ Performance Optimizations

uvloop Integration

The application uses uvloop for enhanced async performance (automatically enabled via Makefile):

Performance Improvements:

  • 2-4x faster asyncio operations
  • Lower latency for I/O-bound tasks
  • Better connection pool management
  • Reduced CPU usage for concurrent requests

Connection Pooling

  • MongoDB: Connection pooling for LangGraph checkpointing and mem0ai
  • Redis (optional): Connection pool for caching

Caching Strategy

  • Only successful responses are cached
  • Configurable TTL based on data volatility
  • Cache invalidation on updates
  • Supports Redis or in-memory caching

πŸ”Œ API Reference

Chat Endpoints

All chat endpoints require:

  • Authorization: Bearer token (JWK from external auth service)
  • conversation_id: Client-provided conversation identifier in request body

Endpoints:

  • POST /api/v1/chatbot/chat - Send message and receive response
  • POST /api/v1/chatbot/chat/stream - Send message with streaming response
  • GET /api/v1/chatbot/messages?conversation_id={id} - Get conversation history
  • DELETE /api/v1/chatbot/messages?conversation_id={id} - Clear chat history

Health & Monitoring

  • GET /health - Health check with service status
  • GET /metrics - Prometheus metrics endpoint

For detailed API documentation, visit /docs (Swagger UI) or /redoc (ReDoc) when running the application.

πŸ“š Project Structure

langgraph-fastapi-template/
β”œβ”€β”€ app/
β”‚   β”œβ”€β”€ api/
β”‚   β”‚   └── v1/
β”‚   β”‚       β”œβ”€β”€ chatbot.py           # Chat endpoints
β”‚   β”‚       └── api.py               # API router aggregation
β”‚   β”œβ”€β”€ core/
β”‚   β”‚   β”œβ”€β”€ config.py                # Configuration management
β”‚   β”‚   β”œβ”€β”€ logging.py               # Logging setup
β”‚   β”‚   β”œβ”€β”€ metrics.py               # Prometheus metrics
β”‚   β”‚   β”œβ”€β”€ middleware.py            # Custom middleware
β”‚   β”‚   β”œβ”€β”€ limiter.py               # Rate limiting
β”‚   β”‚   β”œβ”€β”€ langgraph/
β”‚   β”‚   β”‚   β”œβ”€β”€ graph.py             # LangGraph agent
β”‚   β”‚   β”‚   └── tools.py             # Agent tools
β”‚   β”‚   └── prompts/
β”‚   β”‚       β”œβ”€β”€ __init__.py          # Prompt loader
β”‚   β”‚       └── system.md            # System prompts
β”‚   β”œβ”€β”€ schemas/
β”‚   β”‚   β”œβ”€β”€ chat.py                  # Chat schemas
β”‚   β”‚   └── graph.py                 # Graph state schemas
β”‚   β”œβ”€β”€ services/
β”‚   β”‚   └── llm.py                   # LLM service with retries
β”‚   β”œβ”€β”€ utils/
β”‚   β”‚   β”œβ”€β”€ __init__.py
β”‚   β”‚   β”œβ”€β”€ jwk_auth.py              # JWK authentication
β”‚   β”‚   └── graph.py                 # Graph utility functions
β”‚   └── main.py                      # Application entry point
β”œβ”€β”€ evals/
β”‚   β”œβ”€β”€ evaluator.py                 # Evaluation logic
β”‚   β”œβ”€β”€ main.py                      # Evaluation CLI
β”‚   β”œβ”€β”€ metrics/
β”‚   β”‚   └── prompts/                 # Evaluation metric definitions
β”‚   └── reports/                     # Generated evaluation reports
β”œβ”€β”€ grafana/                         # Grafana dashboards
β”œβ”€β”€ prometheus/                      # Prometheus configuration
β”œβ”€β”€ scripts/                         # Utility scripts
β”œβ”€β”€ docker-compose.yml               # Docker Compose configuration
β”œβ”€β”€ Dockerfile                       # Application Docker image
β”œβ”€β”€ Makefile                         # Development commands
β”œβ”€β”€ pyproject.toml                   # Python dependencies
β”œβ”€β”€ SECURITY.md                      # Security policy
└── README.md                        # This file

πŸ›‘οΈ Security

For security concerns, please review our Security Policy.

πŸ“„ License

This project is licensed under the terms specified in the LICENSE file.

🀝 Contributing

Contributions are welcome! Please ensure:

  1. Code follows the project's coding standards
  2. All tests pass
  3. New features include appropriate tests
  4. Documentation is updated
  5. Commit messages follow conventional commits format

πŸ“ž Support

For issues, questions, or contributions, please open an issue on the project repository

About

A production-ready FastAPI template for building AI agent applications with LangGraph integration. This template provides a robust foundation for building scalable, secure, and maintainable AI agent services.

Resources

License

Security policy

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 85.2%
  • Shell 8.1%
  • Makefile 5.7%
  • Dockerfile 1.0%