A production-grade, modular AI pipeline for real-time face detection, face recognition, and face swapping — powered by YOLOv8, InsightFace, and inswapper_128.
- Overview
- Features
- Architecture
- Tech Stack
- Project Structure
- Quick Start
- Installation
- Configuration
- Usage
- API Reference
- Development
- Docker
- Roadmap
- Ethics & Safety
- License
This project provides a complete, end-to-end AI pipeline that can:
- Detect faces in images and videos using YOLOv8 (real-time, GPU-accelerated)
- Recognize and identify faces using InsightFace ArcFace embeddings
- Swap faces between images/videos using the inswapper_128 ONNX model
- Enhance swapped faces using GFPGAN or CodeFormer for photorealistic results
- Expose everything via a FastAPI REST backend and a Streamlit web UI
| Feature | Status |
|---|---|
| ⚡ YOLOv8 real-time face detection | ✅ Ready |
| 🧠 InsightFace ArcFace recognition + embeddings | ✅ Ready |
| 🔄 inswapper_128 face swap engine | ✅ Ready |
| ✨ GFPGAN face enhancement / restoration | ✅ Ready |
| 🎬 Image & Video processing | ✅ Ready |
| 📷 Webcam / live stream support | ✅ Ready |
| 🌐 FastAPI REST backend | ✅ Ready |
| 🖥️ Streamlit web UI | ✅ Ready |
| 🐳 Docker + docker-compose | ✅ Ready |
| 🖥️ GPU (CUDA) / CPU auto-selection | ✅ Ready |
| 🧪 Unit + Integration tests | ✅ Ready |
| 🔒 Ethics gate + output watermarking | ✅ Ready |
| 📦 Model auto-downloader | ✅ Ready |
Input Image / Video / Webcam
│
▼
┌─────────────────────────┐
│ [1] YOLOv8 Detector │ → Bounding boxes + confidence scores
│ yolov8n-face.pt │
└────────────┬────────────┘
│
▼
┌─────────────────────────┐
│ [2] InsightFace │ → 512-dim ArcFace embeddings
│ Analyser │ → 5-point facial landmarks
│ buffalo_l │
└────────────┬────────────┘
│
▼
┌─────────────────────────┐
│ [3] Face Recognizer │ → Identity match (cosine similarity)
│ FaceDatabase │ → identity name + score
└────────────┬────────────┘
│
▼
┌─────────────────────────┐
│ [4] inswapper_128 │ → Source face identity injected
│ Face Swap Engine │ into target frame (ONNX)
└────────────┬────────────┘
│
▼
┌─────────────────────────┐
│ [5] GFPGAN / CodeFormer│ → Artifact removal + upscaling
│ Face Enhancer │ → Photorealistic output
└────────────┬────────────┘
│
▼
Output Image / Video
| Component | File | Responsibility |
|---|---|---|
| Face Detector | core/detector/yolo_detector.py |
YOLOv8 bounding box detection |
| Face Analyser | core/recognizer/insightface_recognizer.py |
Embedding extraction + landmarks |
| Face Database | core/recognizer/face_database.py |
Identity store, cosine search |
| Face Swapper | core/swapper/inswapper.py |
ONNX face swap inference |
| Face Enhancer | core/enhancer/gfpgan_enhancer.py |
GFPGAN restoration |
| Pipeline | core/pipeline/face_pipeline.py |
Orchestrates all steps |
| Video Pipeline | core/pipeline/video_pipeline.py |
Frame-by-frame video processing |
| REST API | api/main.py |
FastAPI endpoints |
| Web UI | ui/app.py |
Streamlit interface |
| Layer | Technology | Version |
|---|---|---|
| Face Detection | YOLOv8 | ≥ 8.2 |
| Face Analysis | InsightFace | ≥ 0.7.3 |
| Face Swap | inswapper_128.onnx | — |
| Face Enhancement | GFPGAN | ≥ 1.3.8 |
| ONNX Runtime | onnxruntime-gpu | ≥ 1.18 |
| Deep Learning | PyTorch | ≥ 2.2 |
| Backend API | FastAPI | ≥ 0.111 |
| Frontend UI | Streamlit | ≥ 1.35 |
| Image Processing | OpenCV | ≥ 4.9 |
| Configuration | Pydantic | ≥ 2.7 |
| Logging | Loguru | ≥ 0.7 |
| Packaging | Docker + docker-compose | — |
ai-face-recognition/
│
├── core/ # Core AI engine (framework-agnostic)
│ ├── detector/ # YOLOv8 face detection
│ │ ├── base_detector.py # Abstract base class
│ │ └── yolo_detector.py # YOLOv8 implementation
│ │
│ ├── recognizer/ # Face recognition
│ │ ├── base_recognizer.py # Abstract base class
│ │ ├── insightface_recognizer.py # ArcFace embedding extraction
│ │ └── face_database.py # Face identity store
│ │
│ ├── swapper/ # Face swap engine
│ │ ├── base_swapper.py # Abstract base class
│ │ └── inswapper.py # inswapper_128.onnx wrapper
│ │
│ ├── enhancer/ # Post-swap face enhancement
│ │ ├── base_enhancer.py # Abstract base class
│ │ ├── gfpgan_enhancer.py # GFPGAN restorer
│ │ └── codeformer_enhancer.py # CodeFormer alternative
│ │
│ └── pipeline/ # Orchestration layer
│ ├── face_pipeline.py # Image pipeline
│ └── video_pipeline.py # Video pipeline
│
├── api/ # FastAPI REST backend
│ ├── main.py # App entry point
│ ├── routers/
│ │ ├── health.py # GET /api/v1/health
│ │ ├── recognition.py # POST /api/v1/recognize
│ │ └── swap.py # POST /api/v1/swap
│ ├── schemas/
│ │ ├── requests.py # Pydantic request models
│ │ └── responses.py # Pydantic response models
│ └── middleware/
│ └── cors.py # CORS + rate limiting
│
├── ui/ # Streamlit web frontend
│ ├── app.py # Main app entry
│ └── pages/
│ ├── face_recognition.py # Recognition page
│ └── face_swap.py # Swap page
│
├── models/ # AI model weights (git-ignored)
│ ├── yolov8n-face.pt
│ ├── buffalo_l/ # InsightFace model pack
│ ├── inswapper_128.onnx
│ └── GFPGANv1.4.pth
│
├── utils/
│ ├── image_utils.py # Image I/O, transforms, blending
│ ├── video_utils.py # Video I/O, frame extraction
│ ├── mask_utils.py # Face mask generation + blending
│ ├── download_models.py # Auto model downloader
│ └── logger.py # Loguru-based logger
│
├── config/
│ ├── settings.py # Pydantic BaseSettings
│ └── config.yaml # Default config values
│
├── tests/
│ ├── unit/ # Unit tests per module
│ └── integration/ # Full pipeline tests
│
├── docker/
│ ├── Dockerfile # CPU/GPU multi-stage image
│ └── docker-compose.yml # API + UI services
│
├── scripts/
│ ├── setup.sh # Linux/macOS setup
│ └── setup.bat # Windows setup
│
├── requirements.txt
├── requirements-dev.txt
├── .env.example
└── README.md
| Requirement | Minimum | Recommended |
|---|---|---|
| Python | 3.10 | 3.11 |
| RAM | 8 GB | 16 GB |
| GPU VRAM | — (CPU ok) | 6 GB+ (CUDA 12.x) |
| Disk Space | 5 GB | 10 GB |
| OS | Windows 10 / Ubuntu 20.04 / macOS 12 | — |
git clone https://github.com/your-org/ai-face-recognition.git
cd ai-face-recognition
bash scripts/setup.sh # CPU
bash scripts/setup.sh --gpu # CUDA GPUgit clone https://github.com/your-org/ai-face-recognition.git
cd ai-face-recognition
scripts\setup.bat# 1. Clone the repository
git clone https://github.com/your-org/ai-face-recognition.git
cd ai-face-recognition
# 2. Create & activate virtual environment
python -m venv .venv
source .venv/bin/activate # Linux/macOS
# .venv\Scripts\activate # Windows
# 3. Upgrade pip
pip install --upgrade pip setuptools wheel
# 4. Install PyTorch (choose ONE)
# CPU only:
pip install torch torchvision --index-url https://download.pytorch.org/whl/cpu
# CUDA 12.1 (GPU):
pip install torch torchvision --index-url https://download.pytorch.org/whl/cu121
# 5. Install ONNX Runtime (choose ONE)
pip install onnxruntime # CPU
pip install onnxruntime-gpu # GPU
# 6. Install all other dependencies
pip install -r requirements.txt
# 7. Copy environment config
cp .env.example .env
# Edit .env with your settings
# 8. Download model weights
python utils/download_models.py --minimum # Required models only
python utils/download_models.py --all # All models (including enhancers)InsightFace requires C++ build tools on Windows:
- Install Visual Studio Build Tools
- Select: Desktop development with C++
- Install CMake
- Add to PATH during installation
All settings are controlled via environment variables or the .env file.
Copy .env.example to .env and edit as needed:
cp .env.example .env# Hardware
EXECUTION_PROVIDER=cuda # cuda | cpu | mps (Apple Silicon)
DEVICE_ID=0 # GPU index
# YOLOv8 Detection
DETECTOR_CONFIDENCE_THRESHOLD=0.5
DETECTOR_MAX_FACES=10
# Face Recognition
RECOGNIZER_SIMILARITY_THRESHOLD=0.45
# Face Enhancement
ENHANCER_BACKEND=gfpgan # gfpgan | codeformer | none
ENHANCER_FIDELITY_WEIGHT=0.5
# API Server
API_PORT=8000
API_WORKERS=1
# Ethics
ETHICS_REQUIRE_CONSENT=true
ETHICS_WATERMARK_OUTPUT=trueSee .env.example for the full list of options.
# Activate virtual environment first
source .venv/bin/activate
# Development (with hot-reload)
uvicorn api.main:app --reload --host 0.0.0.0 --port 8000
# Production
uvicorn api.main:app --host 0.0.0.0 --port 8000 --workers 1streamlit run ui/app.py --server.port 8501Open your browser at: http://localhost:8501
# Minimum required models (detect + recognize + swap)
python utils/download_models.py --minimum
# All models (includes GFPGAN + CodeFormer enhancers)
python utils/download_models.py --all
# Single model
python utils/download_models.py --model inswapper_128
# Check what's installed
python utils/download_models.py --check
# Force re-download
python utils/download_models.py --all --forcefrom core.pipeline import FacePipeline
from utils import load_image, save_image
# Initialize the pipeline
pipeline = FacePipeline(
detector_device="cuda", # or "cpu"
enhance=True, # Enable GFPGAN post-processing
enhancer_backend="gfpgan",
)
# Load images
source_image = load_image("path/to/source_face.jpg") # Face to copy FROM
target_image = load_image("path/to/target.jpg") # Image to swap INTO
# Run face swap
result = pipeline.swap(
source=source_image,
target=target_image,
consent=True, # Required: explicit consent flag
)
# Save result
save_image(result.output_image, "output/swapped.jpg")
print(f"Detected {result.num_faces} faces")
print(f"Processing time: {result.processing_time_ms:.0f}ms")from core.recognizer import InsightFaceRecognizer, FaceDatabase
# Initialize recognizer
recognizer = InsightFaceRecognizer(model_pack="buffalo_l")
# Build a face database
db = FaceDatabase()
db.register("Alice", recognizer.get_embedding(load_image("alice.jpg")))
db.register("Bob", recognizer.get_embedding(load_image("bob.jpg")))
db.save("cache/face_db.pkl")
# Recognize a new face
query_image = load_image("unknown_person.jpg")
embedding = recognizer.get_embedding(query_image)
match = db.search(embedding, threshold=0.45)
if match:
print(f"Recognized: {match.identity} (similarity={match.similarity:.3f})")
else:
print("Unknown person")Interactive docs available at: http://localhost:8000/docs (Swagger UI)
Check API + model readiness.
Response:
{
"status": "ok",
"version": "1.0.0",
"environment": "development",
"uptime_seconds": 42.3,
"components": {
"detector": {"status": "ok", "loaded": true, "detail": null},
"recognizer": {"status": "ok", "loaded": true, "detail": null},
"swapper": {"status": "ok", "loaded": true, "detail": null},
"enhancer": {"status": "ok", "loaded": false, "detail": "disabled"}
}
}Detect and identify faces in an uploaded image.
Request: multipart/form-data
image(file) — image file (JPEG/PNG/WebP/BMP)consent(bool, required) — must betruetop_k(int, optional, default:1) — candidates per facesimilarity_threshold(float, optional) — override server defaultreturn_attributes(bool, optional) — include age/gender
Response:
{
"num_faces_detected": 1,
"num_faces_recognized": 1,
"faces": [
{
"face_index": 0,
"bbox": {"x1": 100, "y1": 80, "x2": 300, "y2": 320, "confidence": 0.97},
"landmarks": null,
"attributes": {"age": 28.5, "gender": "F", "gender_score": 0.92},
"match": {
"identity_name": "Alice",
"identity_id": "uuid-1234",
"similarity": 0.87,
"is_known": true,
"threshold_used": 0.45
},
"embedding_norm": 1.0
}
],
"inference_time_ms": 34.2,
"image_width": 640,
"image_height": 480
}Swap faces between a source and target image.
Request: multipart/form-data
source_file(file) — image containing the source face (donor identity)target_file(file) — image to swap the face intoconsent(bool, required) — must betrueblend_mode(str, optional, default:"poisson") —poisson|alpha|masked_alphaenhance(bool, optional, default:false) — apply GFPGAN/CodeFormer enhancementsource_face_index(int, optional, default:0) — which face in source to usetarget_face_index(int, optional, default:0) — which face in target to replacereturn_base64(bool, optional, default:false) — return JSON with base64 image
Response (default): image/png — the swapped result image
Response (return_base64=true):
{
"output_url": "/api/v1/results/swap_abc123.png",
"output_base64": "<base64 string>",
"num_faces_swapped": 1,
"num_faces_failed": 0,
"faces": [
{
"face_index": 0,
"bbox": {"x1": 100, "y1": 80, "x2": 300, "y2": 320, "confidence": 0.96},
"success": true,
"status": "success",
"timing": {"align_ms": 2.1, "inference_ms": 18.4, "blend_ms": 3.7, "total_ms": 24.2},
"error": null
}
],
"total_inference_ms": 24.2,
"blend_mode": "poisson",
"enhanced": false,
"watermarked": true
}Register a new face identity in the face database.
Request: multipart/form-data
image(file) — image containing the face to registername(str, required) — identity label for this faceconsent(bool, required) — must betrueidentity_id(str, optional) — existing UUID to append embeddings tooverwrite(bool, optional, default:false) — replace existing embeddings
Response:
{
"identity_id": "a1b2c3d4-1234-5678-abcd-ef0123456789",
"identity_name": "Alice",
"embeddings_added": 1,
"total_embeddings": 3,
"faces_detected": 1,
"message": "Identity 'Alice' updated with 1 new embedding."
}pip install -r requirements-dev.txt
pre-commit install# All tests
pytest tests/ -v
# Unit tests only
pytest tests/unit/ -v
# Integration tests
pytest tests/integration/ -v
# With coverage report
pytest tests/ --cov=. --cov-report=html
open htmlcov/index.html# Format code
black .
isort .
# Lint
flake8 .
# Type checking
mypy .The project uses an abstract BaseDetector class for easy extensibility:
# core/detector/my_detector.py
from core.detector.base_detector import BaseDetector, DetectionResult
class MyCustomDetector(BaseDetector):
def load_model(self) -> None:
# Load your model here
...
def detect(self, image: np.ndarray) -> DetectionResult:
# Return DetectionResult with face boxes
...cd docker
# Build
docker-compose build
# Start all services (API + UI)
docker-compose up
# Download models (first run)
docker-compose --profile setup up model-downloader# Requires: NVIDIA Container Toolkit
# Install: https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/install-guide.html
docker-compose -f docker-compose.yml -f docker-compose.gpu.yml up --build| Service | URL |
|---|---|
| FastAPI Backend | http://localhost:8000 |
| API Docs (Swagger) | http://localhost:8000/docs |
| Streamlit UI | http://localhost:8501 |
- Project structure, requirements, config, logger, utilities
YOLOFaceDetectorwithBaseDetectorabstraction- Image + video + webcam input support
InsightFaceRecognizer(ArcFace embeddings)FaceDatabasewith cosine similarity search + persistence
InSwapperwrappinginswapper_128.onnx- Face alignment using 5-point landmarks
- Poisson blending for seamless compositing
GFPGANEnhancerfor post-swap quality restorationCodeFormerEnhanceras alternative backend
FacePipeline(image) +VideoPipeline(video)- Progress tracking, error handling, telemetry
- REST endpoints:
/swap,/recognize,/register,/health - File upload validation, rate limiting, CORS
- Side-by-side preview
- Face recognition results display
- Video processing with progress bar
- Unit tests for every module
- Integration tests for full pipeline
- FPS benchmark: CPU vs GPU
- Dockerfile (CPU + GPU), docker-compose, GPU override
This project is built with responsible AI principles:
- ✅ Consent gate — All swap API requests require
consent=trueflag - ✅ Output watermarking — All swapped outputs are stamped "AI GENERATED"
- ✅ Request logging — All swap operations are logged (metadata only)
- ✅ No cloud storage — All processing is local; no data sent to third parties
⚠️ Use responsibly — Do NOT use this technology to create non-consensual deepfakes⚠️ Legal compliance — Laws on deepfakes vary by jurisdiction; know your local laws
The authors are not responsible for misuse of this software. This project is intended for education, research, and legitimate creative applications only.
This project is licensed under the MIT License — see the LICENSE file for details.
| Component | License |
|---|---|
| YOLOv8 (Ultralytics) | AGPL-3.0 |
| InsightFace | MIT |
| GFPGAN | Apache 2.0 |
| CodeFormer | S-Lab License 1.0 |
| inswapper_128 | Research / Non-commercial |
| PyTorch | BSD-3-Clause |
| FastAPI | MIT |
| Streamlit | Apache 2.0 |
Important licensing notes:
- AGPL-3.0 (YOLOv8): If you deploy this application as a network service, AGPL-3.0 requires you to release your source code to users of that service.
- inswapper_128.onnx is subject to a non-commercial research license.
- CodeFormer is licensed under S-Lab License 1.0 (non-commercial).
Review all component licenses before any commercial deployment.
- Ultralytics YOLOv8
- InsightFace by deepinsight
- GFPGAN by TencentARC
- CodeFormer by S-Lab
- hacksider/Deep-Live-Cam for inspiration
- akanametov/yolo-face for YOLOv8-face weights