-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Clarification on How runs.zip is Generated and Discrepancies with Different YOLO Weights #1812
Comments
Hi @lalayants.
Somewhere in the output it should say something like "Overwriting detections and embeddings for {mot_folder_path}...". and if you look in the path "current_dir/dets_n_embs/yolov8n/dets" there should be a .txt file with the detections and same for the embeddings. Then you know that the tracker is using the correct embeddings and detections. |
Hi @rolson24, Yes, that's exactly how I launch my benchmarks. However, I’ve encountered a significant discrepancy between the results obtained using the pre-generated embeddings from runs.zip and those generated when I recreate them locally. For example, with yolov8x.pt:
The same weight (yolov8x.pt) is used in both cases, yet the scores drop significantly when I overwrite the embeddings. Could you please explain how the embeddings in runs.zip are generated so that I can replicate that process and make my own embeddings for yolov8n/s/m/l? Any guidance on ensuring that my locally generated embeddings match those in runs.zip would be greatly appreciated. I've seen @mikel-brostrom's comment on #1663 that the embedding are made using BoT. But still can't figure out how to make my own :( |
The data under |
I however parsed the import os
import cv2
import numpy as np
from pathlib import Path
import argparse
base_dirs_with_thresholds = [
("/Users/mikel.brostrom/boxmot/tracking/val_utils/data/MOT17/train/MOT17-02", 302),
("/Users/mikel.brostrom/boxmot/tracking/val_utils/data/MOT17/train/MOT17-04", 527),
("/Users/mikel.brostrom/boxmot/tracking/val_utils/data/MOT17/train/MOT17-05", 420),
("/Users/mikel.brostrom/boxmot/tracking/val_utils/data/MOT17/train/MOT17-09", 264),
("/Users/mikel.brostrom/boxmot/tracking/val_utils/data/MOT17/train/MOT17-10", 329),
("/Users/mikel.brostrom/boxmot/tracking/val_utils/data/MOT17/train/MOT17-11", 452),
("/Users/mikel.brostrom/boxmot/tracking/val_utils/data/MOT17/train/MOT17-13", 377)
]
def get_seq_paths():
# Define the exact paths for each sequence
seq_paths = {
"MOT17-13": "/Users/mikel.brostrom/boxmot/tracking/val_utils/data/MOT17/train/MOT17-13/img1",
"MOT17-11": "/Users/mikel.brostrom/boxmot/tracking/val_utils/data/MOT17/train/MOT17-11/img1",
"MOT17-10": "/Users/mikel.brostrom/boxmot/tracking/val_utils/data/MOT17/train/MOT17-10/img1",
"MOT17-09": "/Users/mikel.brostrom/boxmot/tracking/val_utils/data/MOT17/train/MOT17-09/img1",
"MOT17-05": "/Users/mikel.brostrom/boxmot/tracking/val_utils/data/MOT17/train/MOT17-05/img1",
"MOT17-04": "/Users/mikel.brostrom/boxmot/tracking/val_utils/data/MOT17/train/MOT17-04/img1",
"MOT17-02": "/Users/mikel.brostrom/boxmot/tracking/val_utils/data/MOT17/train/MOT17-02/img1"
}
imgs = {}
for seq, path in seq_paths.items():
img_files = [os.path.join(path, file) for file in os.listdir(path) if file.endswith(".jpg")]
img_files.sort(key=lambda x: int(x.split("/")[-1].split(".")[0])) # Sorting based on file name
imgs[seq] = img_files
return imgs, list(seq_paths.keys())
# Get the threshold for the sequence
def get_threshold_for_sequence(seq):
for base_dir, threshold in base_dirs_with_thresholds:
if seq in base_dir:
return threshold
return 0 # Default threshold if not found (should not happen)
def parse_options():
parser = argparse.ArgumentParser()
parser.add_argument('--reid', type=str, default="osnet_x1_0_dukemtmcreid.pt", help='model.pt path')
parser.add_argument('--dataset', type=str, default='train', help='dataset type')
parser.add_argument('--device', type=str, default='cpu', help='device \'cpu\' or \'0\', \'1\', ... for gpu')
parser.add_argument('--ot', type=float, default=0.2)
return parser.parse_args()
def process_detection(row, frame_no):
tlwh = row[2:6]
# Include the frame number as the first element
return np.array([frame_no, tlwh[0], tlwh[1], tlwh[0] + tlwh[2], tlwh[1] + tlwh[3], row[6], 0])
def save_to_txt(file_path, data):
with open(file_path, 'a') as f:
for item in data:
f.write(' '.join(map(str, item)) + '\n')
if __name__ == "__main__":
args = parse_options()
# Get image paths for each sequence
imgs, seq_names = get_seq_paths()
total_dets = 0
total_frames = 0
# Define paths for output
output_dir_dets = Path("runs/dets_n_embs/yolov8x/dets")
output_dir_embs = Path("runs/dets_n_embs/yolov8x/embs/osnet_x1_0_dukemtmcreid")
output_dir_dets.mkdir(parents=True, exist_ok=True)
output_dir_embs.mkdir(parents=True, exist_ok=True)
for seq in seq_names:
print(f"Processing sequence: {seq}")
seq_imgs = imgs[seq]
# Get the threshold for the current sequence
offset = get_threshold_for_sequence(seq)
# Define text file paths for detections and embeddings for this sequence
dets_txt_path = output_dir_dets / f"{seq}.txt"
embs_txt_path = output_dir_embs / f"{seq}.txt"
# Assuming detections for each sequence are stored in .npy files
det_file = Path(f"./npy_files/{seq}-FRCNN.npy")
seq_det = np.load(det_file, allow_pickle=True)
print(f"Loaded .npy file for {seq}, shape: {seq_det.shape}")
for frame_no, img_path in enumerate(seq_imgs):
frame = cv2.imread(img_path)
frame_dets = seq_det[seq_det[:, 0] == frame_no + offset]
print(f"Detections for frame {frame_no + 1}: {len(frame_dets)}")
if len(frame_dets) < 1:
continue
total_dets += len(frame_dets)
total_frames += 1
processed_dets = np.array([process_detection(row, frame_no + 1) for row in frame_dets])
features = np.array([row[10:] for row in frame_dets]) # Assuming embeddings are the last columns
# Save detections and embeddings to their respective text files
save_to_txt(dets_txt_path, processed_dets)
save_to_txt(embs_txt_path, features)
print(f"Processed {len(seq_imgs)} frames for sequence {seq}")
print(f"Total frames processed: {total_frames}")
print(f"Total detections: {total_dets}") |
Reach me out if you need more details 😄 |
But I can see the confusion in that yolox-x detections are moved into a yolov8x folder and BoT embeddings into a osnet_x1_0_dukemtmcreid folder such that I can then run: python3 tracking/val.py --imgsz 320 --classes 0 --yolo-model yolov8x.pt --reid-model osnet_x1_0_dukemtmcreid.pt --tracking-method ${{ matrix.tracker }} --verbose --source ./tracking/val_utils/data/MOT17-50/train As this embeddings model is not supported it would give errors if specified... |
Hi @mikel-brostrom , Thank you for the detailed explanation regarding runs.zip. You mentioned that BoT is not implemented in the repo, and that the data in runs.zip comes directly from the official StrongSORT repo. Could you please confirm if all the other components (Table I summarizes the path from DeepSORT to StrongSORT) of the system mentioned in paper are fully implemented as described in the paper? In other words, aside from the BoT part, can I expect the rest of the pipeline to work as intended? ![]() Do you have a plan to implement BoT in near future? Thanks again for your time and for maintaining this project! |
Also, as far as I understand from the article, ByteTrack does not use REID feature extraction at all. Yet, when I generate embeddings myself, ByteTrack still achieves a HOTA of 33.549, which is lower than the result reported in the paper. For reference:
Could this be because the default YOLO weights are not pre-trained on MOT17? |
Yes, they are fully implemented
The difference in ablation HOTA to the original papers is minimal. This implementation:
Definitely. The YoloX-X model used to generate detections was trained "on the CrowdHuman dataset [41] and MOT17 half training set for ablation". |
Search before asking
Question
Hi,
I'm encountering an unexpected discrepancy in the benchmark results when using different YOLO weight files with the provided embeddings from runs.zip.
Issue Details:
Using yolov8x.pt:
When I run the evaluation with ocsort, osnet_x1_0_dukemtmcreid.pt for REID, and yolov8x.pt for detection, I obtain the following good results:
ocsort osnet_x1_0_dukemtmcreid.pt yolov8x.pt ✅ 65.187 74.819 75.957 49.78 53.695395487
Tracker REID Model YOLO Model Status HOTA MOTA IDF1 FPS Elapsed_time
Using Other YOLO Variants:
For all other YOLO weight versions, the results are roughly 2x worse. For example:
ocsort osnet_x1_0_dukemtmcreid.pt yolov8m.pt ✅ 29.209 18.321 29.022 40.26 66.385059078
ocsort osnet_x1_0_dukemtmcreid.pt yolov8l.pt ✅ 29.676 18.799 29.655 38.32 69.743151023
ocsort osnet_x1_0_dukemtmcreid.pt yolov8s.pt ✅ 25.841 15.17 24.889 44.90 59.528584394
ocsort osnet_x1_0_dukemtmcreid.pt yolov8n.pt ✅ 20.709 11.475 18.945 48.96 54.584922228
Tracker REID Model YOLO Model Status HOTA MOTA IDF1 FPS Elapsed_time
Based on these observations, it seems that the embeddings in runs.zip are optimal for detections from yolov8x.pt but do not generalize well to the other YOLO variants.
Questions:
Pipeline Details:
Could you please explain how you generated runs.zip? Specifically, what detection and embedding extraction pipeline did you use?
Recommendations:
Do you have any recommendations or guidelines on how to adapt the pipeline if one wishes to use different YOLO variants for detection?
I appreciate your help in clarifying these points. Thanks in advance for your time and assistance!
Best regards,
Kirill
The text was updated successfully, but these errors were encountered: