Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clarification on How runs.zip is Generated and Discrepancies with Different YOLO Weights #1812

Open
1 task done
lalayants opened this issue Feb 9, 2025 · 9 comments
Open
1 task done
Labels
question Further information is requested

Comments

@lalayants
Copy link

lalayants commented Feb 9, 2025

Search before asking

  • I have searched the Yolo Tracking issues and found no similar bug report.

Question

Hi,

I'm encountering an unexpected discrepancy in the benchmark results when using different YOLO weight files with the provided embeddings from runs.zip.

Issue Details:

Using yolov8x.pt:
When I run the evaluation with ocsort, osnet_x1_0_dukemtmcreid.pt for REID, and yolov8x.pt for detection, I obtain the following good results:
ocsort osnet_x1_0_dukemtmcreid.pt yolov8x.pt ✅ 65.187 74.819 75.957 49.78 53.695395487
Tracker REID Model YOLO Model Status HOTA MOTA IDF1 FPS Elapsed_time

Using Other YOLO Variants:
For all other YOLO weight versions, the results are roughly 2x worse. For example:

ocsort osnet_x1_0_dukemtmcreid.pt yolov8m.pt ✅ 29.209 18.321 29.022 40.26 66.385059078
ocsort osnet_x1_0_dukemtmcreid.pt yolov8l.pt ✅ 29.676 18.799 29.655 38.32 69.743151023
ocsort osnet_x1_0_dukemtmcreid.pt yolov8s.pt ✅ 25.841 15.17 24.889 44.90 59.528584394
ocsort osnet_x1_0_dukemtmcreid.pt yolov8n.pt ✅ 20.709 11.475 18.945 48.96 54.584922228
Tracker REID Model YOLO Model Status HOTA MOTA IDF1 FPS Elapsed_time

Based on these observations, it seems that the embeddings in runs.zip are optimal for detections from yolov8x.pt but do not generalize well to the other YOLO variants.

Questions:

Pipeline Details:
Could you please explain how you generated runs.zip? Specifically, what detection and embedding extraction pipeline did you use?

Recommendations:
Do you have any recommendations or guidelines on how to adapt the pipeline if one wishes to use different YOLO variants for detection?

I appreciate your help in clarifying these points. Thanks in advance for your time and assistance!

Best regards,
Kirill

@lalayants lalayants added the question Further information is requested label Feb 9, 2025
@rolson24
Copy link
Contributor

rolson24 commented Feb 9, 2025

Hi @lalayants.
How are you running these benchmarks? I believe that runs.zip is just for creating the scores in the readme, as it has the detections and embeddings already created, and so if you try to run the eval on those detections and embeddings, it will just not run the YOLO model that you specify. I think the proper way to test what you want to test is to use the following command with different yolo models and reid models:

python3 tracking/val.py --yolo-model yolov8n.pt --reid-model osnet_x0_25_msmt17.pt --tracking-method deepocsort --verbose --source ./assets/MOT17-mini/train

Somewhere in the output it should say something like "Overwriting detections and embeddings for {mot_folder_path}...". and if you look in the path "current_dir/dets_n_embs/yolov8n/dets" there should be a .txt file with the detections and same for the embeddings. Then you know that the tracker is using the correct embeddings and detections.

@lalayants
Copy link
Author

lalayants commented Feb 9, 2025

Hi @rolson24,

Yes, that's exactly how I launch my benchmarks. However, I’ve encountered a significant discrepancy between the results obtained using the pre-generated embeddings from runs.zip and those generated when I recreate them locally.

For example, with yolov8x.pt:

  1. Using embeddings from runs.zip:
  • HOTA: 65.187
  • MOTA: 74.819
  • IDF1: 75.957
  1. Recreating the embeddings (after receiving the prompt "Detections and Embeddings … already exists. Overwrite? [y/N]:" and typing "y"):
  • HOTA: 30.822
  • MOTA: 19.839
  • IDF1: 31.201

The same weight (yolov8x.pt) is used in both cases, yet the scores drop significantly when I overwrite the embeddings.

Could you please explain how the embeddings in runs.zip are generated so that I can replicate that process and make my own embeddings for yolov8n/s/m/l? Any guidance on ensuring that my locally generated embeddings match those in runs.zip would be greatly appreciated.

I've seen @mikel-brostrom's comment on #1663 that the embedding are made using BoT. But still can't figure out how to make my own :(

@mikel-brostrom
Copy link
Owner

mikel-brostrom commented Feb 9, 2025

The data under runs.zip comes from the official StrongSORT repo. They used the same YoloX-X as in ByteTrack and for the embeddings generation they used BoT. BoT is not implemented in this repo. Neither is the script for generating the data under runs.zip. I am using it to avoid generating detections and embeddings every single time i benchmark the algorithms.

@mikel-brostrom
Copy link
Owner

I however parsed the npy data to make it compatible with the format expected by this repo using the following script:

import os
import cv2
import numpy as np
from pathlib import Path
import argparse


base_dirs_with_thresholds = [
    ("/Users/mikel.brostrom/boxmot/tracking/val_utils/data/MOT17/train/MOT17-02", 302),
    ("/Users/mikel.brostrom/boxmot/tracking/val_utils/data/MOT17/train/MOT17-04", 527),
    ("/Users/mikel.brostrom/boxmot/tracking/val_utils/data/MOT17/train/MOT17-05", 420),
    ("/Users/mikel.brostrom/boxmot/tracking/val_utils/data/MOT17/train/MOT17-09", 264),
    ("/Users/mikel.brostrom/boxmot/tracking/val_utils/data/MOT17/train/MOT17-10", 329),
    ("/Users/mikel.brostrom/boxmot/tracking/val_utils/data/MOT17/train/MOT17-11", 452),
    ("/Users/mikel.brostrom/boxmot/tracking/val_utils/data/MOT17/train/MOT17-13", 377)
]


def get_seq_paths():
    # Define the exact paths for each sequence
    seq_paths = {
        "MOT17-13": "/Users/mikel.brostrom/boxmot/tracking/val_utils/data/MOT17/train/MOT17-13/img1",
        "MOT17-11": "/Users/mikel.brostrom/boxmot/tracking/val_utils/data/MOT17/train/MOT17-11/img1",
        "MOT17-10": "/Users/mikel.brostrom/boxmot/tracking/val_utils/data/MOT17/train/MOT17-10/img1",
        "MOT17-09": "/Users/mikel.brostrom/boxmot/tracking/val_utils/data/MOT17/train/MOT17-09/img1",
        "MOT17-05": "/Users/mikel.brostrom/boxmot/tracking/val_utils/data/MOT17/train/MOT17-05/img1",
        "MOT17-04": "/Users/mikel.brostrom/boxmot/tracking/val_utils/data/MOT17/train/MOT17-04/img1",
        "MOT17-02": "/Users/mikel.brostrom/boxmot/tracking/val_utils/data/MOT17/train/MOT17-02/img1"
    }
    
    imgs = {}
    for seq, path in seq_paths.items():
        img_files = [os.path.join(path, file) for file in os.listdir(path) if file.endswith(".jpg")]
        img_files.sort(key=lambda x: int(x.split("/")[-1].split(".")[0]))  # Sorting based on file name
        imgs[seq] = img_files
    return imgs, list(seq_paths.keys())


# Get the threshold for the sequence
def get_threshold_for_sequence(seq):
    for base_dir, threshold in base_dirs_with_thresholds:
        if seq in base_dir:
            return threshold
    return 0  # Default threshold if not found (should not happen)


def parse_options():
    parser = argparse.ArgumentParser()
    parser.add_argument('--reid', type=str, default="osnet_x1_0_dukemtmcreid.pt", help='model.pt path')
    parser.add_argument('--dataset', type=str, default='train', help='dataset type')
    parser.add_argument('--device', type=str, default='cpu', help='device \'cpu\' or \'0\', \'1\', ... for gpu')
    parser.add_argument('--ot', type=float, default=0.2)
    return parser.parse_args()

def process_detection(row, frame_no):
    tlwh = row[2:6]
    # Include the frame number as the first element
    return np.array([frame_no, tlwh[0], tlwh[1], tlwh[0] + tlwh[2], tlwh[1] + tlwh[3], row[6], 0])

def save_to_txt(file_path, data):
    with open(file_path, 'a') as f:
        for item in data:
            f.write(' '.join(map(str, item)) + '\n')

if __name__ == "__main__":
    args = parse_options()
    
    # Get image paths for each sequence
    imgs, seq_names = get_seq_paths()

    total_dets = 0
    total_frames = 0

    # Define paths for output
    output_dir_dets = Path("runs/dets_n_embs/yolov8x/dets")
    output_dir_embs = Path("runs/dets_n_embs/yolov8x/embs/osnet_x1_0_dukemtmcreid")
    output_dir_dets.mkdir(parents=True, exist_ok=True)
    output_dir_embs.mkdir(parents=True, exist_ok=True)

    for seq in seq_names:
        print(f"Processing sequence: {seq}")
        seq_imgs = imgs[seq]
        
        # Get the threshold for the current sequence
        offset = get_threshold_for_sequence(seq)

        # Define text file paths for detections and embeddings for this sequence
        dets_txt_path = output_dir_dets / f"{seq}.txt"
        embs_txt_path = output_dir_embs / f"{seq}.txt"

        # Assuming detections for each sequence are stored in .npy files
        det_file = Path(f"./npy_files/{seq}-FRCNN.npy")
        seq_det = np.load(det_file, allow_pickle=True)
        print(f"Loaded .npy file for {seq}, shape: {seq_det.shape}")

        for frame_no, img_path in enumerate(seq_imgs):
            frame = cv2.imread(img_path)
            frame_dets = seq_det[seq_det[:, 0] == frame_no + offset]
            
            print(f"Detections for frame {frame_no + 1}: {len(frame_dets)}")

            if len(frame_dets) < 1:
                continue

            total_dets += len(frame_dets)
            total_frames += 1

            processed_dets = np.array([process_detection(row, frame_no + 1) for row in frame_dets])
            features = np.array([row[10:] for row in frame_dets])  # Assuming embeddings are the last columns

            # Save detections and embeddings to their respective text files
            save_to_txt(dets_txt_path, processed_dets)
            save_to_txt(embs_txt_path, features)

        print(f"Processed {len(seq_imgs)} frames for sequence {seq}")

    print(f"Total frames processed: {total_frames}")
    print(f"Total detections: {total_dets}")

@mikel-brostrom
Copy link
Owner

Reach me out if you need more details 😄

@mikel-brostrom
Copy link
Owner

mikel-brostrom commented Feb 10, 2025

But I can see the confusion in that yolox-x detections are moved into a yolov8x folder and BoT embeddings into a osnet_x1_0_dukemtmcreid folder such that I can then run:

python3 tracking/val.py --imgsz 320 --classes 0 --yolo-model yolov8x.pt --reid-model osnet_x1_0_dukemtmcreid.pt --tracking-method ${{ matrix.tracker }} --verbose --source ./tracking/val_utils/data/MOT17-50/train

As this embeddings model is not supported it would give errors if specified...

@lalayants
Copy link
Author

lalayants commented Feb 10, 2025

Hi @mikel-brostrom ,

Thank you for the detailed explanation regarding runs.zip.

You mentioned that BoT is not implemented in the repo, and that the data in runs.zip comes directly from the official StrongSORT repo. Could you please confirm if all the other components (Table I summarizes the path from DeepSORT to StrongSORT) of the system mentioned in paper are fully implemented as described in the paper? In other words, aside from the BoT part, can I expect the rest of the pipeline to work as intended?

Image

Do you have a plan to implement BoT in near future?

Thanks again for your time and for maintaining this project!

@lalayants
Copy link
Author

lalayants commented Feb 10, 2025

Also, as far as I understand from the article, ByteTrack does not use REID feature extraction at all. Yet, when I generate embeddings myself, ByteTrack still achieves a HOTA of 33.549, which is lower than the result reported in the paper. For reference:

bytetrack  osnet_x1_0_dukemtmcreid.pt  yolov8x.pt  OK      33.549  22.813  35.313  38.05  70.244091999
Tracker    REID Model                  YOLO Model  Status  HOTA    MOTA    IDF1    FPS    Elapsed_time

Could this be because the default YOLO weights are not pre-trained on MOT17?

@mikel-brostrom
Copy link
Owner

mikel-brostrom commented Feb 11, 2025

Could you please confirm if all the other components (Table I summarizes the path from DeepSORT to StrongSORT) of the system mentioned in paper are fully implemented as described in the paper?

Yes, they are fully implemented

In other words, aside from the BoT part, can I expect the rest of the pipeline to work as intended?

The difference in ablation HOTA to the original papers is minimal. This implementation: 68.3, paper: 69.6. But I believe I evaluated the original implementation and the results were lower there as well.

Could this be because the default YOLO weights are not pre-trained on MOT17?

Definitely. The YoloX-X model used to generate detections was trained "on the CrowdHuman dataset [41] and MOT17 half training set for ablation".

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

3 participants