A production-ready FastAPI template for building AI agent applications with LangGraph integration. This template provides a robust foundation for building scalable, secure, and maintainable AI agent services.
-
Production-Ready Architecture
- FastAPI for high-performance async API endpoints with uvloop optimization
- LangGraph integration for AI agent workflows with state persistence
- LangSmith for LLM observability and monitoring
- Sentry for error tracking and performance monitoring
- Structured logging with environment-specific formatting and request context
- Rate limiting with configurable rules per endpoint
- MongoDB Atlas for LangGraph checkpointing and mem0ai memory storage
- Docker and Docker Compose support
- Prometheus metrics and Grafana dashboards for monitoring
-
AI & LLM Features
- Long-term memory with mem0ai and MongoDB for semantic memory storage
- LLM Service with automatic retry logic using tenacity
- Multiple LLM model support (GPT-4o, GPT-4o-mini, GPT-5, GPT-5-mini, GPT-5-nano)
- Streaming responses for real-time chat interactions
- Tool calling and function execution capabilities
-
Security
- JWK (JSON Web Key) authentication with external auth service
- Client-managed conversation sessions
- Input sanitization
- CORS configuration
- Rate limiting protection
-
Developer Experience
- Environment-specific configuration with automatic .env file loading
- Comprehensive logging system with context binding
- Clear project structure following best practices
- Type hints throughout for better IDE support
- Easy local development setup with Makefile commands
- Automatic retry logic with exponential backoff for resilience
- Python 3.13+
- MongoDB Atlas account (for LangGraph checkpointing and mem0ai)
- External authentication service with JWKS endpoint
- Docker and Docker Compose (optional)
- Clone the repository:
git clone <repository-url>
cd <project-directory>- Create and activate a virtual environment:
uv sync- Copy the example environment file:
cp .env.example .env.[development|staging|production] # e.g. .env.development- Update the
.envfile with your configuration (see.env.examplefor reference)
- Create a MongoDB Atlas cluster at https://cloud.mongodb.com
- Get your connection string
- Update the MongoDB connection in your
.envfile:
# Note: Do not include the database name in the URI path - specify it separately
MONGODB_URI=mongodb+srv://<username>:<password>@<cluster>.mongodb.net/?retryWrites=true&w=majority
MONGODB_DB_NAME=langgraph_db- Configure your external authentication service JWKS endpoint
- Update the authentication settings in your
.envfile:
AUTH_URL="https://your-auth-service.com"
JWT_ISSUER="https://your-auth-service.com"
JWT_AUDIENCE="your-audience"- Install dependencies:
uv sync- Run the application:
make [dev|staging|prod] # e.g. make dev- Go to Swagger UI:
http://localhost:8000/docs- Build and run with Docker Compose:
make docker-build-env ENV=[development|staging|production] # e.g. make docker-build-env ENV=development
make docker-run-env ENV=[development|staging|production] # e.g. make docker-run-env ENV=development- Access the monitoring stack:
# Prometheus metrics
http://localhost:9090
# Grafana dashboards
http://localhost:3000
Default credentials:
- Username: admin
- Password: adminThe Docker setup includes:
- FastAPI application
- Prometheus for metrics collection
- Grafana for metrics visualization
- Pre-configured dashboards for:
- API performance metrics
- Rate limiting statistics
- LLM inference metrics
- System resource usage
The application uses a flexible configuration system with environment-specific settings:
.env.development- Local development settings.env.staging- Staging environment settings.env.production- Production environment settings
Key configuration variables include:
# Application
APP_ENV=development
PROJECT_NAME="FastAPI LangGraph Agent"
DEBUG=true
# MongoDB (for LangGraph checkpointing and mem0ai)
# Note: Do not include the database name in the URI path - specify it separately
MONGODB_URI=mongodb+srv://<username>:<password>@<cluster>.mongodb.net/?retryWrites=true&w=majority
MONGODB_DB_NAME=langgraph_db
# JWK Authentication
AUTH_URL="https://your-auth-service.com"
JWT_ISSUER="https://your-auth-service.com"
JWT_AUDIENCE="your-audience"
# LLM Configuration
OPENAI_API_KEY=your_openai_api_key
DEFAULT_LLM_MODEL=gpt-4o
DEFAULT_LLM_TEMPERATURE=0.7
MAX_TOKENS=4096
# Long-Term Memory
LONG_TERM_MEMORY_COLLECTION_NAME=agent_memories
LONG_TERM_MEMORY_MODEL=gpt-4o-mini
LONG_TERM_MEMORY_EMBEDDER_MODEL=text-embedding-3-small
# Observability (Optional - LangSmith)
LANGCHAIN_TRACING_V2=false # Set to true to enable LangSmith tracing
LANGCHAIN_API_KEY=your_langsmith_api_key
LANGCHAIN_PROJECT=langgraph-fastapi-template
LANGCHAIN_ENDPOINT=https://api.smith.langchain.com
# Rate Limiting
RATE_LIMIT_ENABLED=trueThe application includes a sophisticated long-term memory system powered by mem0ai and MongoDB:
- Semantic Memory Storage: Stores and retrieves memories based on semantic similarity
- User-Specific Memories: Each user has their own isolated memory space
- Automatic Memory Management: Memories are automatically extracted, stored, and retrieved
- Vector Search: Uses MongoDB Atlas for efficient similarity search
- Configurable Models: Separate models for memory processing and embeddings
- Memory Addition: During conversations, important information is automatically extracted and stored
- Memory Retrieval: Relevant memories are retrieved based on conversation context
- Memory Search: Semantic search finds related memories across conversations
- Memory Updates: Existing memories can be updated as new information becomes available
The LLM service provides robust, production-ready language model interactions with automatic retry logic and multiple model support.
- Multiple Model Support: Pre-configured support for GPT-4o, GPT-4o-mini, GPT-5, and GPT-5 variants
- Automatic Retries: Uses tenacity for exponential backoff retry logic
- Reasoning Configuration: GPT-5 models support configurable reasoning effort levels
- Environment-Specific Tuning: Different parameters for development vs production
- Fallback Mechanisms: Graceful degradation when primary models fail
| Model | Use Case | Reasoning Effort |
|---|---|---|
| gpt-5 | Complex reasoning tasks | Medium |
| gpt-5-mini | Balanced performance | Low |
| gpt-5-nano | Fast responses | Minimal |
| gpt-4o | Production workloads | N/A |
| gpt-4o-mini | Cost-effective tasks | N/A |
- Automatically retries on API timeouts, rate limits, and temporary errors
- Max Attempts: 3
- Wait Strategy: Exponential backoff (1s, 2s, 4s)
- Logging: All retry attempts are logged with context
The application uses structlog for structured, contextual logging with automatic request tracking.
- Structured Logging: All logs are structured with consistent fields
- Request Context: Automatic binding of request_id, session_id, and user_id
- Environment-Specific Formatting: JSON in production, colored console in development
- Performance Tracking: Automatic logging of request duration and status
- Exception Tracking: Full stack traces with context preservation
Every request automatically gets:
- Unique request ID
- User ID (from JWK token)
- Conversation ID (from client)
- Request path and method
- Response status and duration
- Event Names: lowercase_with_underscores
- No F-Strings: Pass variables as kwargs for proper filtering
- Context Binding: Always include relevant IDs and context
- Appropriate Levels: debug, info, warning, error, exception
The application uses uvloop for enhanced async performance (automatically enabled via Makefile):
Performance Improvements:
- 2-4x faster asyncio operations
- Lower latency for I/O-bound tasks
- Better connection pool management
- Reduced CPU usage for concurrent requests
- MongoDB: Connection pooling for LangGraph checkpointing and mem0ai
- Redis (optional): Connection pool for caching
- Only successful responses are cached
- Configurable TTL based on data volatility
- Cache invalidation on updates
- Supports Redis or in-memory caching
All chat endpoints require:
- Authorization: Bearer token (JWK from external auth service)
- conversation_id: Client-provided conversation identifier in request body
Endpoints:
POST /api/v1/chatbot/chat- Send message and receive responsePOST /api/v1/chatbot/chat/stream- Send message with streaming responseGET /api/v1/chatbot/messages?conversation_id={id}- Get conversation historyDELETE /api/v1/chatbot/messages?conversation_id={id}- Clear chat history
GET /health- Health check with service statusGET /metrics- Prometheus metrics endpoint
For detailed API documentation, visit /docs (Swagger UI) or /redoc (ReDoc) when running the application.
langgraph-fastapi-template/
βββ app/
β βββ api/
β β βββ v1/
β β βββ chatbot.py # Chat endpoints
β β βββ api.py # API router aggregation
β βββ core/
β β βββ config.py # Configuration management
β β βββ logging.py # Logging setup
β β βββ metrics.py # Prometheus metrics
β β βββ middleware.py # Custom middleware
β β βββ limiter.py # Rate limiting
β β βββ langgraph/
β β β βββ graph.py # LangGraph agent
β β β βββ tools.py # Agent tools
β β βββ prompts/
β β βββ __init__.py # Prompt loader
β β βββ system.md # System prompts
β βββ schemas/
β β βββ chat.py # Chat schemas
β β βββ graph.py # Graph state schemas
β βββ services/
β β βββ llm.py # LLM service with retries
β βββ utils/
β β βββ __init__.py
β β βββ jwk_auth.py # JWK authentication
β β βββ graph.py # Graph utility functions
β βββ main.py # Application entry point
βββ evals/
β βββ evaluator.py # Evaluation logic
β βββ main.py # Evaluation CLI
β βββ metrics/
β β βββ prompts/ # Evaluation metric definitions
β βββ reports/ # Generated evaluation reports
βββ grafana/ # Grafana dashboards
βββ prometheus/ # Prometheus configuration
βββ scripts/ # Utility scripts
βββ docker-compose.yml # Docker Compose configuration
βββ Dockerfile # Application Docker image
βββ Makefile # Development commands
βββ pyproject.toml # Python dependencies
βββ SECURITY.md # Security policy
βββ README.md # This file
For security concerns, please review our Security Policy.
This project is licensed under the terms specified in the LICENSE file.
Contributions are welcome! Please ensure:
- Code follows the project's coding standards
- All tests pass
- New features include appropriate tests
- Documentation is updated
- Commit messages follow conventional commits format
For issues, questions, or contributions, please open an issue on the project repository