Tested on Ubuntu 22.04 and with CUDA 12.1 using RTX 4090 GPU.
git clone https://github.com/jagennath-hari/Edge-Optimized-Tracking-System.git
A YOLOv11s model is used for demonstration.
Unzip the compressed file and place the best.pt
and best.onnx
in the weights
folder.
The pretrained weights has been trained on a single example dataset from the SportsMOT dataset. It was trained on the soccer dataset specifically v_gQNyhv8y0QY_c013 instance.
Sample Dataset on OneDrive from Authors
A folder called SportsMOT_example
gets created after extracting the file.
Start building the docker image.
bash build.sh
Compiling the code inside the container.
bash compile.sh
These need to be done only once and does not have to be repeated.
To run the composed container with Triton and the executable.
DATASET_PATH=/path/to/your/SportsMOT_example bash run_and_exit.sh
The output video gets saved in the /tracker_system/result
folder.
End-to-end Pipeline
The overall system is divided into three sub-systems, Perception, ByteTracker, and Particle Filter. Each of the sub-systems are explained below.
Perception Design
Divided into two sub-components which is the one time quantization, then the setting up the ensembled network for Triton Inference Server.
ByteTrack Design
The orginal authors paper was used, the Offical Reposiory gives a detailed explantion of the implementation.
CUDA Particle Filter Design
Implementation uses a GPU accelerated Particle Filter with an additional Unscented Transform for the prediction step.
There are a total of 10 states.
Training on custom dataset using YOLOv11
Training script here.
Follow the Official Documentation. A lack of accuracy may occur sometimes depending on the complexity of the objects, follow Tuning or use advaced frameworks like Ray Tune, WandB, etc.
ONNX Conversion for YOLOv11
Conversion script here. Follow the Official Documentation for more configurations. Manual conversions are also possible follow Official PyTorch Tutorial.
Quantize the network
A bash file which runs TensorRT executor here, which may to be changed based on the input and output based on the network architecture, right percesion values are required for faster inferences eg fp16
, fp32
, int32
, etc.
Changing the Triton Ensemble Model
The models folder has all the entire pipeline. Based on the network architecture the pre-processing and post-processing files need to be changed. Typically the config.pbtxt
for all the steps might require changes based on the entire peception logic.
It is recommended to check if Triton is able to register you ensembled model by running bash run_container.sh
and then inside running /opt/tritonserver/bin/tritonserver --model-repository=/models
.
Running the Docker compose
Follow the file and modify the path correctly. This should keep the entire end-to-end pipeline the same.
Using API for any new Perception, Tracking and Filter.
The entire API are defined in the files *_interface.hpp
so by overriding the fucntions you can plug and play any custom solutions.
Rapid prototype
Follow the file to experiment with python.
- The particle filter can be extended to other applications such as 3D tracking, but it requires changes to the state space model.
- If running on NVIDIA Jetson, CUDA Shared Memory is not supported for Triton, the ensembled model needs to be changed as ARM uses unified memory.
- ByteTrack may not be the best solution, more SOTA learning based trackers can yeild better correspondences.
- The system dynamics for the particle filter use simple equations of motion, it is best to use more complex dynamics when object motions are highly non-linear.
- The noise values may need tuning inside the particle filter.
- Quantizing to int8 or fp16 can yeild faster inferences but at the cost of accuracy, it is a good idea to balance both, and match the applications requirements more for the ideal selection.
If you found this code/work to be useful in your own research, please considering citing the following:
@software{Jocher_Ultralytics_YOLO_2023,
author = {Jocher, Glenn and Qiu, Jing and Chaurasia, Ayush},
license = {AGPL-3.0},
month = jan,
title = {{Ultralytics YOLO}},
url = {https://github.com/ultralytics/ultralytics},
version = {8.0.0},
year = {2023}
}
@article{zhang2022bytetrack,
title={ByteTrack: Multi-Object Tracking by Associating Every Detection Box},
author={Zhang, Yifu and Sun, Peize and Jiang, Yi and Yu, Dongdong and Weng, Fucheng and Yuan, Zehuan and Luo, Ping and Liu, Wenyu and Wang, Xinggang},
booktitle={Proceedings of the European Conference on Computer Vision (ECCV)},
year={2022}
}
@article{cui2023sportsmot,
title={SportsMOT: A Large Multi-Object Tracking Dataset in Multiple Sports Scenes},
author={Cui, Yutao and Zeng, Chenkai and Zhao, Xiaoyu and Yang, Yichun and Wu, Gangshan and Wang, Limin},
journal={arXiv preprint arXiv:2304.05170},
year={2023}
}
This software is released under BSD-3-Clause license. You can view a license summary here. Ultralytics and ByteTrack have their own licenses respectively.