This service includes a comprehensive Out of Memory (OOM) protection system that monitors memory usage and automatically triggers a graceful restart when memory thresholds are exceeded, preventing system crashes.
- Background thread monitors system memory every 30 seconds
- Tracks both system RAM and GPU memory usage
- Detects when memory usage exceeds the configured threshold
- When OOM is detected, the service initiates a graceful shutdown:
- Stops accepting new requests
- Waits for current processing to complete (60s timeout)
- Clears GPU memory
- Forces garbage collection
- Restarts the service automatically
The /health endpoint now includes detailed memory information:
curl http://localhost:8000/healthResponse:
{
"status": "healthy",
"model_loaded": true,
"memory": {
"system_memory_percent": 65.2,
"system_memory_available_gb": 14.5,
"system_memory_total_gb": 32.0,
"process_memory_gb": 3.2,
"process_memory_percent": 10.1,
"gpu_memory_allocated_gb": 2.1,
"gpu_memory_reserved_gb": 4.0
},
"oom_protection_enabled": true,
"memory_threshold_percent": 90.0
}Use the provided script to monitor memory usage in real-time:
./monitor_memory.shOutput:
==========================================
OCR Service Memory Monitor
==========================================
📡 Service PID: 12345
[2026-03-19 10:30:00] Process: 3.20GB | System: 65.2% | GPU: 2.1 GB / 24.0 GB
[2026-03-19 10:30:05] Process: 3.25GB | System: 66.1% | GPU: 2.1 GB / 24.0 GB
| Variable | Default | Description |
|---|---|---|
OOM_RESTART_ENABLED |
true |
Enable/disable OOM protection |
OOM_MEMORY_THRESHOLD |
90 |
Memory threshold percentage (0-100) |
# Enable OOM protection (default: true)
OOM_RESTART_ENABLED=true
# Set memory threshold to 85% for more aggressive protection
OOM_MEMORY_THRESHOLD=85A dedicated background thread checks memory every 30 seconds:
# In serve_pdf.py
def monitor_memory_loop():
# Checks:
# - System memory usage
# - Process memory usage
# - GPU memory usage
# - Triggers restart if threshold exceededBefore processing each PDF:
- Checks if system is already in OOM condition
- Rejects processing if memory is critically low
- Triggers restart if needed
After processing each PDF:
- Verifies memory hasn't exceeded threshold
- Triggers restart if memory is high
Catches specific OOM exceptions:
torch.cuda.OutOfMemoryError- GPU OOMMemoryError- System RAM OOM- Automatically triggers graceful restart
# Watch memory usage
./monitor_memory.sh
# In another terminal, process a large PDF
curl -X POST http://localhost:8000/process_pdf \
-H "Authorization: Bearer your-token" \
-F "file=@large.pdf"You can temporarily lower the threshold to test the restart mechanism:
# In .env
OOM_MEMORY_THRESHOLD=30 # Will trigger restart at 30%
# Restart service
docker-compose restartwatch -n 5 'curl -s http://localhost:8000/health | jq'Problem: Service enters restart loop
Solutions:
- Increase
OOM_MEMORY_THRESHOLD - Reduce
BATCH_SIZEin serve_pdf.py - Reduce
MAX_CONCURRENCYin config.py - Process smaller PDFs
Problem: Even with protections, memory usage is too high
Solutions:
- Reduce DPI in
pdf_to_images_high_quality()(default: 144) - Reduce
BATCH_SIZEinprocess_pdf_internal()(default: 10) - Reduce
NUM_WORKERSin config.py - Limit concurrent requests
Problem: Memory monitor not starting
Check:
# Check logs
docker logs <container> | grep -i "memory monitor"
# Verify psutil is installed
python -c "import psutil; print(psutil.__version__)"✅ Memory monitor started (threshold: 90%, interval: 30s)
⚠️ OOM CONDITION DETECTED:
System Memory: 91.2%
Process Memory: 18.50 GB
Available Memory: 2.80 GB
🔄 INITIATING GRACEFUL RESTART DUE TO OOM CONDITION
❌ System OOM condition detected before processing: 92.5% memory usage
❌ GPU OOM: CUDA out of memory
- Monitor Regularly: Use
monitor_memory.shduring operation - Set Appropriate Threshold: 90% is default; adjust based on your system
- Process Smaller Batches: If you have memory issues, reduce BATCH_SIZE
- Check Health Endpoint: Use
/healthto monitor memory trends - Review Logs: Check for OOM warnings to identify problematic files
- Memory overhead: ~5-10 MB for monitoring thread
- CPU overhead: Negligible (<0.1% CPU)
- Restart time: ~5-10 seconds for graceful shutdown
For issues or questions about OOM protection:
- Check logs for OOM warnings
- Use
/healthendpoint to monitor memory - Review configuration in
.envfile - Adjust threshold based on your system capacity