This project aims to implement an advanced object tracking system using the YOLO (You Only Look Once) deep learning model for object detection, combined with a custom tracking algorithm. The tracking algorithm uses a combination of Intersection over Union (IoU), visual similarity (using embeddings from EfficientNet), and Kalman filters for effective and robust object tracking. The system handles track initialization, updating, and termination based on various factors like detection confidence, longevity, and appearance consistency.
- YOLO Object Detection: Utilizes YOLOv5 for real-time object detection.
- Matcher Algorithm: Custom algorithm for tracking objects across frames. It uses IoU, visual similarity, and Kalman Filters.
- Kalman Filter: Predicts the state of a moving object based on its previous state.
- Embedding Similarity: Uses EfficientNet to extract feature embeddings for visual similarity calculation.
- Longevity Management: Tracks the persistence of objects across frames and manages the termination of old tracks.
The project requires the following libraries:
- OpenCV (
cv2): For image and video processing. - PyTorch (
torch): As the backbone for YOLOv5 and tensor operations. - Torchvision: For image transformations and EfficientNet.
- NumPy: For numerical computations.
- SciPy: Specifically for
linear_sum_assignmentused in the Matcher. - Memory Profiler: For memory profiling (optional).
Ensure you have Python 3.x installed. You can install the required libraries using:
pip install opencv-python-headless torch torchvision numpy scipy memory-profilerTo run the YOLO model for object detection on a set of video frames and output the detections:
python yolo_predictions.py --video_input path/to/input/frames --output_file path/to/output/detections.txtAfter generating the detection file with YOLO, you can run the tracker as follows:
python tracker.py --det_file path/to/detections.txt --video_input path/to/input/frames --output_video path/to/output/video.avi --output_file path/to/output/solution.txt- Make sure the input video frames are sequentially named and placed in the specified directory.
- The output from the tracker (
sol.txt) will contain the tracked object information for each frame. - Adjust the
iou_weightandsimilarity_weightarguments intracker.pyto fine-tune the tracking based on your specific requirements.