Xiaohan Zhang*
Tavis Shore*
Chen Chen
Oscar Mendez
Simon Hadfield
Safwan Wshah
Vermont Artificial Intelligence Laboratory (VaiL)
Centre for Vision, Speech, and Signal Processing (CVSSP)
University of Central Florida
Locus Robotics
| Backbone | Params (M) | FLOPs (G) | Dims | R@1 | R@5 | R@10 |
|---|---|---|---|---|---|---|
| ConvNeXt-T | 28 | 4.5 | 768 | 1.36 | 4.34 | 7.95 |
| ConvNeXt-B | 89 | 15.4 | 1024 | 3.14 | 8.14 | 13.22 |
| ViT-B | 86 | 17.6 | 768 | 3.30 | 8.92 | 13.96 |
| ViT-L | 307 | 60.6 | 1024 | 9.62 | 23.42 | 32.73 |
| DINOv2-B | 86 | 152 | 768 | 17.37 | 36.14 | 46.96 |
| DINOv2-L | 304 | 507 | 1024 | 27.49 | 51.96 | 63.13 |
| VLM | R@1 | R@5 | R@10 |
|---|---|---|---|
| Without Re-ranking | 27.49 | 51.96 | 63.13 |
| Gemini 2.5 Flash Lite | 23.54 | 48.39 | 63.13 |
| Gemini 2.5 Flash | 30.21 | 53.04 | 63.13 |
| R@1 | R@5 | R@10 | |
|---|---|---|---|
| 0 | 24.47 | 48.16 | 60.99 |
| 0.1 | 26.98 | 51.34 | 61.92 |
| 0.3 | 27.49 | 51.96 | 63.13 |
| 0.5 | 24.89 | 52.03 | 62.66 |
| Model | R@1 | R@5 | R@10 |
|---|---|---|---|
| U1652~\cite{zheng2020university} | 1.20 | - | - |
| LPN w/o drone~\cite{wang2021each} | 0.74 | - | - |
| LPN w/ drone~\cite{wang2021each} | 0.81 | - | - |
| DINOv2-L | 24.66 | 48.00 | 59.02 |
| + Drone Data | 27.49 | 51.96 | 63.13 |
| + VLM Re-rank (Ours) | 30.21 | 53.04 | 63.13 |
conda env create -n ENV -f requirements.yaml && conda activate ENVBefore running Stage 1, configure your dataset paths:
- Navigate to the
/config/directory. - Open the
default.yamlfile (or copy it to a new file). - Replace the placeholder values (e.g.,
DATA_ROOT) with the actual paths to your dataset and related files.
Once your configuration file is ready, you can train Stage 1 using:
python stage_1.py --config YOUR_CONFIG_FILE_NAMEYou can also download our pre-trained weights here.
To run Stage 2, you need to:
- Open the stage_2.py file.
- Replace the relevant placeholders (e.g., the path to the answer file from Stage 1 and your Gemini API key).
- Ensure any other required directories or options are correctly set.
Then, simply run:
python stage_2.pyThis will perform re-ranking using a Vision-Language Model (VLM) on top of the initial retrieval results. There will be a LLM_re_ranked_answer.txt in the answer directory and a reasons.json containing all the reasons for re-ranking.
