MOSAIQ is a standardised multimodal benchmark for soundscape AI research, integrating audio, visual, and perceptual rating data from multiple source datasets (ISD, ARAUS, …) under a unified schema. This repository hosts the MOSAIQ data schema, validation tooling, build scripts, and (in later phases) baseline models and evaluation pipelines.
MOSAIQ/
├── README.md
├── pyproject.toml
├── uv.lock
├── datacatalog.yaml
│
├── catalogue/
│ ├── datapackage.yaml
│ ├── datasets.csv
│ └── datasets_catalogue.json
│
├── datasets/
│ ├── ISD/
│ │ ├── datapackage.yaml
│ │ ├── schemas/
│ │ │ ├── clips.schema.yaml
│ │ │ ├── features.schema.yaml
│ │ │ └── responses.schema.yaml
│ │ └── data/
│ │ ├── clips.csv
│ │ ├── features.csv
│ │ └── responses.csv
│ ├── ARAUS/
│ │ ├── datapackage.yaml
│ │ ├── SCHEMA_NOTES.md
│ │ ├── schemas/
│ │ │ ├── clips.schema.yaml
│ │ │ └── responses.schema.yaml
│ │ └── data/
│ │ ├── clips.csv
│ │ └── responses.csv
│ ├── SATP/
│ │ ├── datapackage.yaml
│ │ ├── SCHEMA_NOTES.md
│ │ ├── schemas/
│ │ └── data/
│ └── DeLTA/
│ ├── datapackage.yaml
│ ├── SCHEMA_NOTES.md
│ ├── schemas/
│ └── data/
│
├── shared_schemas/
│ ├── datasets.schema.yaml
│ └── features.schema.yaml
│
├── config/
│ └── cityseg_class_map.yaml
│
├── scripts/
│ ├── build_isd.py
│ ├── build_araus.py
│ ├── build_satp.py
│ ├── build_delta.py
│ ├── build_clip_features.py
│ ├── build_cityseg_features.py
│ ├── validate_in_python.py
│ └── validate_mosaiq.py
│
└── notebooks/
└── 01_explore_isd.ipynb
This project uses uv for environment and dependency management.
curl -LsSf https://astral.sh/uv/install.sh | shFrom the repository root:
uv syncThis creates a .venv/ in the project directory and installs all
dependencies pinned in uv.lock.
uv run frictionless validate catalogue/datapackage.yaml
uv run frictionless validate datasets/ISD/datapackage.yaml --trusted
uv run frictionless validate datasets/ARAUS/datapackage.yaml --trusted
uv run frictionless validate datasets/SATP/datapackage.yaml
uv run frictionless validate datasets/DeLTA/datapackage.yamlExpected output: catalogue resources (datasets) and dataset package
resources (clips, responses, optional features) all reporting VALID.
(--trusted is used for dataset packages because the shared feature schema is
referenced via a parent-relative path.)
MOSAIQ supports optional derived FeatureRecords linked by clip_id in
datasets/<dataset>/data/features.csv.
Supported examples include:
- CLIP visual embeddings
- CitySeg semantic summaries
In this release, MOSAIQ defines the shared FeatureRecord schema for visual derived features and placeholder resources. Psychoacoustic indicators remain in clip-level/acoustic metadata when available, rather than in the FeatureRecord layer. Caption features are a future extension and are not included in the current schema.
Input (clips.csv must contain these columns):
clip_id: unique clip identifier used to link feature records.dataset_id: dataset namespace used infeature_id.video_assetandvideo_asset_id: source video linkage; script will resolve to real video files.start_s,end_s: temporal segment boundaries used for frame sampling.
Output:
datasets/<dataset>/data/features.csvwith columns:feature_id, clip_id, feature_type, source_modality, value_format, provenance_json, feature_path, feature_value_json, embedding_dim, dtype, model_name, model_version, input_asset_id, frame_time_s, frame_index, pooling, language, notes- If
--storage npy(default): one.npyfile per clip under:datasets/<dataset>/data/features/clip_embedding/ - If
--storage base64: embedding payload is stored infeature_value_json.
Sampling and feature definition:
feature_typeis alwaysvisual_clip_embedding.source_modalityisvisual.- Default frame rule is center frame:
t = (start_s + end_s) / 2. - Pooling/frame metadata are stored in
feature_value_json.
Mandatory provenance fields written to provenance_json:
model,version,library_versions,frame_sampling_rule,preprocess,device,generated_at,script_version.
How to use:
- Install runtime dependencies (one-time):
uv add open-clip-torch torch torchvision pillow opencv-python-headless- Build features for one dataset (recommended
.npystorage):
uv run python scripts/build_clip_features.py \
--dataset-dir datasets/ISD \
--video-root /path/to/videos \
--model-name ViT-B/32 \
--pretrained openai \
--storage npy \
--mode append- Validate package integrity after extraction:
uv run frictionless validate datasets/ISD/datapackage.yamlUseful options:
--dataset-dir: target dataset root (datasets/ISDordatasets/ARAUS).--video-root: root directory to resolvevideo_asset/video_asset_id.--storage:npyorbase64.--dtype:float16,float32, orfloat64.--device:auto,cpu, orcuda.--limit: process only first N clips for smoke tests.--skip-missing-video: skip unresolved clips instead of failing.--mode:appendoroverwrite.
CitySeg summaries are optional visual semantic FeatureRecords linked by
clip_id.
- CLIP embeddings provide scalable visual baseline features.
- CitySeg summaries provide interpretable semantic descriptors such as road, vegetation, sky, building, vehicle, and person proportions.
- These features support PAQ item prediction, ISO-coordinate prediction, and future gaze-on-class analysis.
Feature conventions for CitySeg:
feature_type=visual_semantic_summarysource_modality=visualvalue_format=json(orpathfor external large summaries)- Full segmentation masks/HDF5 are not stored in
features.csv; only clip summaries are stored directly, with raw assets referenced by path/provenance.
Example command:
uv run python scripts/build_cityseg_features.py \
--clips datasets/ISD/data/clips.csv \
--cityseg-dir /path/to/cityseg_outputs \
--output datasets/ISD/data/features.csv \
--dataset-id ISD \
--mode appendValidation for features:
uv run python scripts/validate_mosaiq.py --dataset-dir datasets/ISDIf you have access to the original ISD ISD_v1_0_Data.csv, you can
regenerate the derived CSVs from scratch:
uv run python scripts/build_datasets_csv.py # rebuilds data/catalogue/datasets.csv
uv run python scripts/build_isd.py # rebuilds data/ISD/clips.csv + responses.csv
uv run python scripts/build_satp.py # rebuilds data/SATP/clips.csv + responses.csv
uv run python scripts/build_delta.py # rebuilds data/DeLTA/clips.csv + responses.csvMOSAIQ separates the data into three layers, each with its own schema:
-
Dataset-level catalogue (
data/catalogue/datasets.csv) — one row per source dataset, capturing scale, modalities, recording specifications, perceptual framework, and access conditions. -
Clip-level metadata (
data/<dataset>/clips.csv) — one row per clip with aggregated PAQ ratings, derived ISO Pleasant / Eventful coordinates, and available acoustic or psychoacoustic measurements. -
Response-level metadata (
data/<dataset>/responses.csv) — one row per individual participant assessment, linked to clips viaclip_id.
Schemas are formally specified using the Frictionless Data Package standard, which supports type checks, value-range constraints, enumerations, and foreign-key relationships across resources.
An additional schema-level harmonisation layer is documented in
docs/schema_level_harmonisation.md.
This layer prepares ISD and ARAUS for later benchmark construction using a
shared structure and conservative ISO 12913 semantics; it does not claim that
the datasets are statistically or fully harmonised.
uv add <package-name>
uv add --dev <package-name> # development-only (e.g. jupyter, pytest)uv run python <script.py>uv add --dev jupyter ipykernel
uv run jupyter labIf you use MOSAIQ in your research, please cite:
@misc{mosaiq2026,
author = {Liang, Yuqi and Mitchell, Andrew and Kang, Jian and Aletta, Francesco},
title = {MOSAIQ: Multimodal Open Soundscape AI Quality-benchmark},
year = {2026},
publisher = {GitHub},
howpublished = {\url{https://github.com/yuqiliang/MOSAIQ}}
}- Yuqi Liang — UCL Institute for Environmental Design and Engineering
- Francesco Aletta — UCL Institute for Environmental Design and Engineering
- Jian Kang — UCL Institute for Environmental Design and Engineering
- Andrew Mitchell — UCL Bartlett School of Sustainable Construction
- Schemas, code, and documentation: MIT
- Data: per-source-dataset licences (see
licence_spdxfield in eachdatasets.csvrow)