GPU-Accelerated Video Deduplicator

A high-performance pipeline architecture desktop application for finding and removing duplicate video files. It leverages PyTorch, TorchVision, and CUDA to quickly analyze video content through a pre-trained EfficientNet-B0 model.

It is highly robust for detecting videos that have been resized, compressed, or slightly visually altered, backed by an optimized hardware-aware pipeline designed to saturate RTX NVIDIA hardware using FP16 mixed precision and asynchronous double buffering.

Features

Blazing Fast GPU Acceleration: Uses CUDA Tensor Cores (FP16 mixed precision) for neural network-based video frame fingerprinting.
Asynchronous Pipelining: Features a double-buffered architecture, aggressively extracting the next video batch via FFmpeg threading while the GPU simultaneously crunches the current batch.
Hardware-Aware Auto Tuning: Deep probes the host for CPU logical cores, active NVDEC decoder engines, and PyTorch free VRAM mapping. It strictly generates a tuned hw_profile.json (leaving safe hardware headroom for user OS tasks) to skip constant polling overheads. OOM recoveries are guaranteed natively.
O(1) CPU Fast Filtering: Prunes impossible matches strictly through zero-decode ffprobe metadata queries before ANY frames touch the deep learning layer.
Content-Based Similarity: Employs Deep Feature embeddings (EfficientNet-B0) rather than basic file hashing, effectively detecting compressed, slightly altered, or differing format videos as exact matches.
Interactive UI: Built using a Python backend (Flask) with an elegant, responsive frontend inside a native desktop window (using pywebview).
One-Click Bulk Delete: Auto-selects the "Best" video based on resolution and size, letting you batch delete duplicates securely to the Recycle Bin.
Smart Pause & Resume: Robust SQLite session caching allows for pause and resume mid-scan without losing structural progress.

Architecture & Infrastructure

Check out the deep-dive on data flow, pipeline architecture, and structural optimizations inside INFRASTRUCTURE.md.

Requirements

Python 3.9+ (Tested working seamlessly in embedded portable environments).
FFmpeg & ffprobe: Must be installed and available on your system PATH.
NVIDIA GPU (CUDA): Designed for systems with Compute Capability. Maximum performance requires the torch module built with CUDA support. (Will gracefully fallback to CPU if not found).

System Path Check

The application validates the ffmpeg presence at runtime. If not found, you will get a clear error inside the application header allowing you to correct the configuration.

Installation

Clone this repository:

git clone git@github.com:appsjuragan/py-video-deduplication.git
cd py-video-deduplication

Install Python dependencies:
```
pip install -r requirements.txt
```
Run the application:
```
python app.py
```

Creating a Standalone Executable

You can bundle the entire application (including dependencies and deep learning models) into a standalone .exe using PyInstaller.

pyinstaller app.spec --clean

The resulting executable will be generated inside the dist/ directory.

Usage Guide

Launch the app and add paths to the Folders to Scan list.
Tweak Settings: Adjust the similarity threshold to catch partial clips, or increase frames. Batch sizes automatically scale to fit VRAM dynamically.
Scan: Click Start Scan and watch the fast-filter drop disparate files, while the remainder hit the GPU batch queue.
Compare: Evaluate duplicate groups. The app automatically flags the lower-quality or smaller videos for deletion.
Clean: Click "Delete Checked" to send duplicate files safely to the trash.

License

MIT License.

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
static		static
templates		templates
.gitignore		.gitignore
INFRASTRUCTURE.md		INFRASTRUCTURE.md
README.md		README.md
app.py		app.py
comparator.py		comparator.py
cuda_pipeline.py		cuda_pipeline.py
dummy.mp4		dummy.mp4
extractor.py		extractor.py
hasher.py		hasher.py
hw_profile.py		hw_profile.py
requirements.txt		requirements.txt
scanner.py		scanner.py
test_hasher.py		test_hasher.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

GPU-Accelerated Video Deduplicator

Features

Architecture & Infrastructure

Requirements

System Path Check

Installation

Creating a Standalone Executable

Usage Guide

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

GPU-Accelerated Video Deduplicator

Features

Architecture & Infrastructure

Requirements

System Path Check

Installation

Creating a Standalone Executable

Usage Guide

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages