Skip to content

A high-performance multi-object tracking system utilizing a quantized YOLOv11 model deployed on the Triton Inference Server, integrated with a CUDA-accelerated particle filter for robust tracking mutiple objects.

License

Notifications You must be signed in to change notification settings

jagennath-hari/Edge-Optimized-Tracking-System

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

65 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Edge-Optimized-Tracking-System

The Edge Optimized Tracking System is a high-performance object tracking and inference pipeline designed for real-time applications. Leveraging NVIDIA's Triton Inference Server and GPU acceleration, this project combines state-of-the-art object detection, tracking, and particle filtering to achieve robust and efficient object tracking. It includes support for deep learning models, modular tracking algorithms, and advanced filtering mechanisms for precise localization and orientation estimation.
Final Result

Multi-instance tracking with localization and orientation estimation using particle filter.

🏁 Dependencies

  1. Docker
  2. NVIDIA Driver
  3. CUDA Toolkit
  4. NVIDIA Container Toolkit
  5. Docker Compose plugin

Tested on Ubuntu 22.04 and with CUDA 12.1 using RTX 4090 GPU.

βš™οΈ Setup

Clone the Repository

git clone https://github.com/jagennath-hari/Edge-Optimized-Tracking-System.git

Pre trained weights for SportsMOT dataset

A YOLOv11s model is used for demonstration.

Pretrained Weights

Unzip the compressed file and place the best.pt and best.onnx in the weights folder.

Dataset Download

The pretrained weights has been trained on a single example dataset from the SportsMOT dataset. It was trained on the soccer dataset specifically v_gQNyhv8y0QY_c013 instance.

Sample Dataset on OneDrive from Authors

A folder called SportsMOT_example gets created after extracting the file.

πŸ—οΈ Building the 🐳 Docker file

Start building the docker image.

bash build.sh

Compiling the code inside the container.

bash compile.sh

These need to be done only once and does not have to be repeated.

βŒ›οΈ Running on sample data

To run the composed container with Triton and the executable.

DATASET_PATH=/path/to/your/SportsMOT_example bash run_and_exit.sh

The output video gets saved in the /tracker_system/result folder.

πŸ“– Algorithim Overview

Perception Algorithim
Algorithm 1

Ensembled Model Algorithim

Algorithm 2

Perception Algorithim

CUDA Particle Filter Algorithim
Main Sys Design

Particle Filter Algorithim.

πŸ“ System Design

End-to-end Pipeline
Main Sys Design

Overall System Design.

The overall system is divided into three sub-systems, Perception, ByteTracker, and Particle Filter. Each of the sub-systems are explained below.

Perception Design

Divided into two sub-components which is the one time quantization, then the setting up the ensembled network for Triton Inference Server.

Quantization Framework

Quantization Sys Design

Quantization framework.

Inference for Triton Inference Server using ensembled model

Perception Inference Sys Design

Inference framework.

ByteTrack Design

The orginal authors paper was used, the Offical Reposiory gives a detailed explantion of the implementation.

CUDA Particle Filter Design

Implementation uses a GPU accelerated Particle Filter with an additional Unscented Transform for the prediction step.

Structre of Array (SoA) for the states

There are a total of 10 states.

Particle States Design

Particle States Structre of Array.

CUDA Particle Filter with Unscented Transform

Particle States Design

Particle Filter Process on the Device(GPU) with the Unscented Transform by propogating Sigma Points.

πŸ’Ύ Running on custom data

Training on custom dataset using YOLOv11

Training script here.

Follow the Official Documentation. A lack of accuracy may occur sometimes depending on the complexity of the objects, follow Tuning or use advaced frameworks like Ray Tune, WandB, etc.

ONNX Conversion for YOLOv11

Conversion script here. Follow the Official Documentation for more configurations. Manual conversions are also possible follow Official PyTorch Tutorial.

Quantize the network

A bash file which runs TensorRT executor here, which may to be changed based on the input and output based on the network architecture, right percesion values are required for faster inferences eg fp16, fp32, int32, etc.

Changing the Triton Ensemble Model

The models folder has all the entire pipeline. Based on the network architecture the pre-processing and post-processing files need to be changed. Typically the config.pbtxt for all the steps might require changes based on the entire peception logic.

It is recommended to check if Triton is able to register you ensembled model by running bash run_container.sh and then inside running /opt/tritonserver/bin/tritonserver --model-repository=/models.

Running the Docker compose

Follow the file and modify the path correctly. This should keep the entire end-to-end pipeline the same.

Using API for any new Perception, Tracking and Filter.

The entire API are defined in the files *_interface.hpp so by overriding the fucntions you can plug and play any custom solutions.

Rapid prototype

Follow the file to experiment with python.

πŸ› οΈ Final Result

Tracking System Result

Edge-Optimized Tracking System for the SportsMOT Dataset as an example.

⚠️ Note

  1. The particle filter can be extended to other applications such as 3D tracking, but it requires changes to the state space model.
  2. If running on NVIDIA Jetson, CUDA Shared Memory is not supported for Triton, the ensembled model needs to be changed as ARM uses unified memory.
  3. ByteTrack may not be the best solution, more SOTA learning based trackers can yeild better correspondences.
  4. The system dynamics for the particle filter use simple equations of motion, it is best to use more complex dynamics when object motions are highly non-linear.
  5. The noise values may need tuning inside the particle filter.
  6. Quantizing to int8 or fp16 can yeild faster inferences but at the cost of accuracy, it is a good idea to balance both, and match the applications requirements more for the ideal selection.

πŸ“– Citation

If you found this code/work to be useful in your own research, please considering citing the following:

@software{Jocher_Ultralytics_YOLO_2023,
author = {Jocher, Glenn and Qiu, Jing and Chaurasia, Ayush},
license = {AGPL-3.0},
month = jan,
title = {{Ultralytics YOLO}},
url = {https://github.com/ultralytics/ultralytics},
version = {8.0.0},
year = {2023}
}
@article{zhang2022bytetrack,
  title={ByteTrack: Multi-Object Tracking by Associating Every Detection Box},
  author={Zhang, Yifu and Sun, Peize and Jiang, Yi and Yu, Dongdong and Weng, Fucheng and Yuan, Zehuan and Luo, Ping and Liu, Wenyu and Wang, Xinggang},
  booktitle={Proceedings of the European Conference on Computer Vision (ECCV)},
  year={2022}
}
@article{cui2023sportsmot,
  title={SportsMOT: A Large Multi-Object Tracking Dataset in Multiple Sports Scenes},
  author={Cui, Yutao and Zeng, Chenkai and Zhao, Xiaoyu and Yang, Yichun and Wu, Gangshan and Wang, Limin},
  journal={arXiv preprint arXiv:2304.05170},
  year={2023}
}

πŸͺͺ License

This software is released under BSD-3-Clause license. You can view a license summary here. Ultralytics and ByteTrack have their own licenses respectively.