A real-time facial expression recognition system powered by deep learning. This project features a custom VGG-style CNN with spatial attention mechanism, trained on the FER2013 dataset to classify 5 core emotions with high accuracy.
git clone https://github.com/ReynardL/EmotionClassifier.git
cd EmotionClassifierThis project uses UV for fast, reliable dependency management.
# Install UV if you haven't already
pip install uv
# Sync dependencies (installs all required packages)
uv sync# Run with UV
uv run live_app.pyWhat you'll see:
- Live video feed from your webcam (mirrored for natural interaction)
- Face detection with color-coded bounding boxes
- Real-time emotion predictions with confidence percentages
- Top 3 emotion predictions displayed next to each detected face
Keyboard Controls:
q- Quit the application
✨ 5 Emotion Classes: Anger, Happiness, Neutral, Sadness, Surprise
🚀 Real-time Performance: Custom CNN optimized for fast inference on live webcam feeds
🎯 Spatial Attention: Focuses on important facial regions (eyes, mouth, eyebrows)
📊 Visual Feedback: Live emotion predictions with confidence scores
🎨 Color-coded Results: Each emotion has a distinct color for easy identification
The project uses a subset of the FER2013+ dataset organized as follows:
fer2013/
├── train/
│ ├── anger/
│ ├── happiness/
│ ├── neutral/
│ ├── sadness/
│ └── surprise/
└── test/
├── anger/
├── happiness/
├── neutral/
├── sadness/
└── surprise/
Training Configuration:
- Epochs: 75 (with early stopping after 15 epochs without improvement)
- Batch Size: 64
- Optimizer: Adam (learning_rate=0.001)
- Loss: Categorical Crossentropy with Label Smoothing (0.1)
- Architecture: VGG-style CNN with spatial attention mechanism
- Data Augmentation:
- Rotation (±20°)
- Width/Height Shift (±15%)
- Shear & Zoom (±15%)
- Brightness (80%-120%)
- Horizontal Flip
- Callbacks:
- ModelCheckpoint: Saves best model based on validation accuracy
- EarlyStopping: Patience of 15 epochs
- ReduceLROnPlateau: Reduces LR by 50% after 5 epochs of plateau
Training Output:
best_model.keras- Best model based on validation accuracyclass_labels.json- Emotion class label mappingstraining_history.png- Training/validation accuracy and loss plots
The model uses a VGG-style CNN with Spatial Attention (custom architecture):
Input (48×48×1 grayscale)
↓
Block 1: Conv2D(64)×2 + BatchNorm + MaxPool(2×2) + Dropout(0.25)
↓ [Output: 24×24×64]
Block 2: Conv2D(128)×2 + BatchNorm + MaxPool(2×2) + Dropout(0.25)
↓ [Output: 12×12×128]
Block 3: Conv2D(256)×3 + BatchNorm + Spatial Attention + MaxPool(2×2) + Dropout(0.3)
↓ [Output: 6×6×256]
Block 4: Conv2D(512)×2 + BatchNorm + MaxPool(2×2) + Dropout(0.3)
↓ [Output: 3×3×512]
Global Average Pooling
↓ [Output: 512]
Dense(512) + BatchNorm + Dropout(0.5)
↓
Dense(256) + BatchNorm + Dropout(0.5)
↓
Dense(5) + Softmax
↓
Output: [anger, happiness, neutral, sadness, surprise]
Architecture Highlights:
- VGG-Inspired Design: Progressive channel doubling (64→128→256→512)
- Spatial Attention Mechanism:
- Implemented in Block 3
- Uses 7×7 convolution on avg+max pooled features
- Focuses on critical facial landmarks (eyes, eyebrows, mouth)
- Multiplies attention map with feature maps
- Batch Normalization: After every convolution for stable training
- Heavy Regularization: Multiple dropout layers (0.25 → 0.5)
- Global Average Pooling: Reduces parameters and provides translation invariance
- Label Smoothing: Prevents overconfidence in predictions
Total Parameters: ~5.6M
- Ensure good lighting conditions
- Position your face clearly in front of the camera
- Keep a neutral distance (not too close or far)
- Avoid extreme head angles
- Input: 48×48 grayscale images
- Normalization: Pixel values scaled to [0, 1]
- Augmentation (training): Rotation, shifts, zoom, brightness, flips
- Optimizer: Adam (adaptive learning rate)
- Initial Learning Rate: 0.001
- Batch Size: 64
- Loss Function: Categorical Cross-Entropy with Label Smoothing (0.1)
- Metrics: Accuracy
- Class Weights: Automatically computed to handle class imbalance
- Validation Split: 20% of training data
- Method: Haar Cascade Classifier (OpenCV)
- Cascade:
haarcascade_frontalface_default.xml(frontal face detection) - Parameters:
- scaleFactor: 1.1 (image pyramid scale)
- minNeighbors: 5 (minimum detection confidence)
- minSize: (48, 48) pixels (matches model input size)
EmotionClassifier/
├── fer2013/ # Dataset directory
│ ├── train/ # Training images (5 emotion folders)
│ │ ├── anger/
│ │ ├── happiness/
│ │ ├── neutral/
│ │ ├── sadness/
│ │ └── surprise/
│ └── test/ # Test images (5 emotion folders)
│ ├── anger/
│ ├── happiness/
│ ├── neutral/
│ ├── sadness/
│ └── surprise/
├── live_app.py # Real-time webcam application (fully documented)
├── train.ipynb # Training notebook (comprehensive documentation)
├── best_model.keras # Best trained model (pre-trained, included)
├── class_labels.json # Emotion class label mappings
├── pyproject.toml # Project dependencies and metadata
├── README.md # This file
└── __pycache__/ # Python cache directory
System Requirements:
- Python 3.10 or higher
- Webcam/Camera (for live detection)
Key Dependencies:
- TensorFlow 2.15+
- OpenCV (cv2) 4.8+
- NumPy
- Matplotlib (for training visualizations)
- Jupyter (optional, for notebook training)
This project uses UV for modern, fast dependency management:
- Primary:
pyproject.toml- Defines all project metadata and dependencies - UV Lock: Automatic lock file generation for reproducible builds
- Pip Compatible: Can also be installed with
pip install -e .
- FER2013 Dataset: Facial Expression Recognition 2013 challenge dataset
- TensorFlow/Keras: Deep learning framework and high-level API
- OpenCV: Computer vision library (Haar Cascade face detection)
- UV (Astral): Modern Python package and project manager