This repo expects you to place benchmark files and raw memory artifacts under
data/.
- Benchmark QA:
data/atm-bench/atm-bench.jsondata/atm-bench/atm-bench-hard.jsondata/atm-bench/niah/(NIAH pool files)
- Raw personal memory (user-provided):
data/raw_memory/image/(raw images)data/raw_memory/video/(raw videos)data/raw_memory/email/emails.json(optional; see schema below)
- Generated artifacts:
output/image/qwen3vl2b/batch_results.json(generated)output/video/qwen3vl2b/batch_results.json(generated)data/processed_memory/(optional; for any future normalized memory store)output/(all run outputs; always safe to delete/re-generate)
Note: data/ and output/ are gitignored.
Current status:
- Paper/code release is available.
- ATM-Bench dataset release is available on Hugging Face:
https://huggingface.co/datasets/Jingbiao/ATM-Bench
Release channel:
- Hugging Face (dataset artifacts and versioned files).
Release metadata checklist:
- HF dataset link:
https://huggingface.co/datasets/Jingbiao/ATM-Bench - Versioning scheme (tag/date + git commit)
sha256checksums for released files- Minimal download instructions (curl /
huggingface_hub) - License + citation block
The QA files are JSON arrays (or a dict with a qas list) of entries containing:
id(string)question(string)answer(string)evidence_ids(list of strings; ground-truth evidence IDs)
For NIAH pool files, each entry additionally contains:
niah_evidence_ids(list of strings; fixed evidence pool, superset ofevidence_ids)
If a QA item includes email evidence IDs (IDs starting with email...), Oracle/MMRAG
scripts may load email evidence from a JSON list with entries like:
[
{
"id": "email202401010001",
"timestamp": "2024-01-01 12:34:56",
"short_summary": "One-line summary",
"detail": "Longer email content or extracted body"
}
]If your released benchmark does not include emails, you can omit this file.
Text-only evidence for images/videos is read from batch_results.json files.
Scripts index entries by Path(image_path).stem / Path(video_path).stem.
Each entry typically contains:
image_path/video_path(string path; used to derive the evidence ID stem)timestamp(string)location_name(string)short_caption(string)caption(string)ocr_text(string)tags(list of strings)
location_name is derived from GPS coordinates via reverse geocoding (default: OpenStreetMap Nominatim).
Public geocoding endpoints are rate-limited (often strict per-IP requests/minute) and can become a
bottleneck for large archives, especially if you run the processors with high concurrency.
The processors cache reverse-geocoding results as JSON files under <output_dir>/cache/:
<media_filename_stem>_location_name.json
The cache key is the media filename stem, not the file contents hash. For example:
20220430_132212_location_name.json20220502_172850_location_name.json
If you have a pre-extracted GPS cache bundle, place it under:
data/raw_memory/geocoding_cache/imagedata/raw_memory/geocoding_cache/video
Then copy those cache files into your processor cache directory before running the processors so geocoding calls are skipped. The bundle must match the current image/video filenames:
python memqa/utils/copy_gps_info.py data/raw_memory/geocoding_cache/image output/image/qwen3vl2b/cache
python memqa/utils/copy_gps_info.py data/raw_memory/geocoding_cache/video output/video/qwen3vl2b/cacheYou can also use the convenience wrappers:
scripts/memory_processor/image/copy_gps_cache.shscripts/memory_processor/video/copy_gps_cache.sh
If you have an MMRAG run that produced retrieval_recall_details.json, you can
build/validate NIAH pools via:
python scripts/QA_Agent/NIAH/build_niah_pools.py \
--qa-file data/atm-bench/atm-bench-hard.json \
--retrieval-details <PATH_TO>/retrieval_recall_details.json \
--pool-sizes 25 50 100 200