MOSAIQ — Multimodal Open Soundscape AI Quality-benchmark

MOSAIQ is a standardised multimodal benchmark for soundscape AI research, integrating audio, visual, and perceptual rating data from multiple source datasets (ISD, ARAUS, …) under a unified schema. This repository hosts the MOSAIQ data schema, validation tooling, build scripts, and (in later phases) baseline models and evaluation pipelines.

Repository structure

MOSAIQ/
├── README.md
├── pyproject.toml
├── uv.lock
├── datacatalog.yaml             
│
├── catalogue/                   
│   ├── datapackage.yaml
│   ├── datasets.csv
│   └── datasets_catalogue.json   
│
├── datasets/                    
│   ├── ISD/
│   │   ├── datapackage.yaml
│   │   ├── schemas/
│   │   │   ├── clips.schema.yaml
│   │   │   ├── features.schema.yaml
│   │   │   └── responses.schema.yaml
│   │   └── data/
│   │       ├── clips.csv
│   │       ├── features.csv
│   │       └── responses.csv
│   ├── ARAUS/
│   │   ├── datapackage.yaml
│   │   ├── SCHEMA_NOTES.md
│   │   ├── schemas/
│   │   │   ├── clips.schema.yaml
│   │   │   └── responses.schema.yaml
│   │   └── data/
│   │       ├── clips.csv
│   │       └── responses.csv
│   ├── SATP/
│   │   ├── datapackage.yaml
│   │   ├── SCHEMA_NOTES.md
│   │   ├── schemas/
│   │   └── data/
│   └── DeLTA/
│       ├── datapackage.yaml
│       ├── SCHEMA_NOTES.md
│       ├── schemas/
│       └── data/
│
├── shared_schemas/               
│   ├── datasets.schema.yaml
│   └── features.schema.yaml
│
├── config/
│   └── cityseg_class_map.yaml
│
├── scripts/
│   ├── build_isd.py
│   ├── build_araus.py
│   ├── build_satp.py
│   ├── build_delta.py
│   ├── build_clip_features.py
│   ├── build_cityseg_features.py
│   ├── validate_in_python.py
│   └── validate_mosaiq.py
│
└── notebooks/
    └── 01_explore_isd.ipynb

Quick start

This project uses uv for environment and dependency management.

1. Install uv

curl -LsSf https://astral.sh/uv/install.sh | sh

2. Sync the environment

From the repository root:

uv sync

This creates a .venv/ in the project directory and installs all dependencies pinned in uv.lock.

3. Validate the data package

uv run frictionless validate catalogue/datapackage.yaml
uv run frictionless validate datasets/ISD/datapackage.yaml --trusted
uv run frictionless validate datasets/ARAUS/datapackage.yaml --trusted
uv run frictionless validate datasets/SATP/datapackage.yaml
uv run frictionless validate datasets/DeLTA/datapackage.yaml

Expected output: catalogue resources (datasets) and dataset package resources (clips, responses, optional features) all reporting VALID. (--trusted is used for dataset packages because the shared feature schema is referenced via a parent-relative path.)

Derived FeatureRecords

MOSAIQ supports optional derived FeatureRecords linked by clip_id in datasets/<dataset>/data/features.csv.

Supported examples include:

CLIP visual embeddings
CitySeg semantic summaries

In this release, MOSAIQ defines the shared FeatureRecord schema for visual derived features and placeholder resources. Psychoacoustic indicators remain in clip-level/acoustic metadata when available, rather than in the FeatureRecord layer. Caption features are a future extension and are not included in the current schema.

4. Build CLIP visual embedding features (optional)

Input (clips.csv must contain these columns):

clip_id: unique clip identifier used to link feature records.
dataset_id: dataset namespace used in feature_id.
video_asset and video_asset_id: source video linkage; script will resolve to real video files.
start_s, end_s: temporal segment boundaries used for frame sampling.

Output:

datasets/<dataset>/data/features.csv with columns: feature_id, clip_id, feature_type, source_modality, value_format, provenance_json, feature_path, feature_value_json, embedding_dim, dtype, model_name, model_version, input_asset_id, frame_time_s, frame_index, pooling, language, notes
If --storage npy (default): one .npy file per clip under: datasets/<dataset>/data/features/clip_embedding/
If --storage base64: embedding payload is stored in feature_value_json.

Sampling and feature definition:

feature_type is always visual_clip_embedding.
source_modality is visual.
Default frame rule is center frame: t = (start_s + end_s) / 2.
Pooling/frame metadata are stored in feature_value_json.

Mandatory provenance fields written to provenance_json:

model, version, library_versions, frame_sampling_rule, preprocess, device, generated_at, script_version.

How to use:

Install runtime dependencies (one-time):

uv add open-clip-torch torch torchvision pillow opencv-python-headless

Build features for one dataset (recommended .npy storage):

uv run python scripts/build_clip_features.py \
  --dataset-dir datasets/ISD \
  --video-root /path/to/videos \
  --model-name ViT-B/32 \
  --pretrained openai \
  --storage npy \
  --mode append

Validate package integrity after extraction:

uv run frictionless validate datasets/ISD/datapackage.yaml

Useful options:

--dataset-dir: target dataset root (datasets/ISD or datasets/ARAUS).
--video-root: root directory to resolve video_asset/video_asset_id.
--storage: npy or base64.
--dtype: float16, float32, or float64.
--device: auto, cpu, or cuda.
--limit: process only first N clips for smoke tests.
--skip-missing-video: skip unresolved clips instead of failing.
--mode: append or overwrite.

5. CitySeg semantic summaries (optional)

CitySeg summaries are optional visual semantic FeatureRecords linked by clip_id.

CLIP embeddings provide scalable visual baseline features.
CitySeg summaries provide interpretable semantic descriptors such as road, vegetation, sky, building, vehicle, and person proportions.
These features support PAQ item prediction, ISO-coordinate prediction, and future gaze-on-class analysis.

Feature conventions for CitySeg:

feature_type=visual_semantic_summary
source_modality=visual
value_format=json (or path for external large summaries)
Full segmentation masks/HDF5 are not stored in features.csv; only clip summaries are stored directly, with raw assets referenced by path/provenance.

Example command:

uv run python scripts/build_cityseg_features.py \
  --clips datasets/ISD/data/clips.csv \
  --cityseg-dir /path/to/cityseg_outputs \
  --output datasets/ISD/data/features.csv \
  --dataset-id ISD \
  --mode append

Validation for features:

uv run python scripts/validate_mosaiq.py --dataset-dir datasets/ISD

6. Regenerate data from source

If you have access to the original ISD ISD_v1_0_Data.csv, you can regenerate the derived CSVs from scratch:

uv run python scripts/build_datasets_csv.py     # rebuilds data/catalogue/datasets.csv
uv run python scripts/build_isd.py              # rebuilds data/ISD/clips.csv + responses.csv
uv run python scripts/build_satp.py             # rebuilds data/SATP/clips.csv + responses.csv
uv run python scripts/build_delta.py            # rebuilds data/DeLTA/clips.csv + responses.csv

Schema design philosophy

MOSAIQ separates the data into three layers, each with its own schema:

Dataset-level catalogue (data/catalogue/datasets.csv) — one row per source dataset, capturing scale, modalities, recording specifications, perceptual framework, and access conditions.
Clip-level metadata (data/<dataset>/clips.csv) — one row per clip with aggregated PAQ ratings, derived ISO Pleasant / Eventful coordinates, and available acoustic or psychoacoustic measurements.
Response-level metadata (data/<dataset>/responses.csv) — one row per individual participant assessment, linked to clips via clip_id.

Schemas are formally specified using the Frictionless Data Package standard, which supports type checks, value-range constraints, enumerations, and foreign-key relationships across resources.

An additional schema-level harmonisation layer is documented in docs/schema_level_harmonisation.md. This layer prepares ISD and ARAUS for later benchmark construction using a shared structure and conservative ISO 12913 semantics; it does not claim that the datasets are statistically or fully harmonised.

Development

Add a dependency

uv add <package-name>
uv add --dev <package-name>     # development-only (e.g. jupyter, pytest)

Run a Python script

uv run python <script.py>

Open Jupyter

uv add --dev jupyter ipykernel
uv run jupyter lab

Citation

If you use MOSAIQ in your research, please cite:

@misc{mosaiq2026,
  author    = {Liang, Yuqi and Mitchell, Andrew and Kang, Jian and Aletta, Francesco},
  title     = {MOSAIQ: Multimodal Open Soundscape AI Quality-benchmark},
  year      = {2026},
  publisher = {GitHub},
  howpublished = {\url{https://github.com/yuqiliang/MOSAIQ}}
}

Team

Yuqi Liang — UCL Institute for Environmental Design and Engineering
Francesco Aletta — UCL Institute for Environmental Design and Engineering
Jian Kang — UCL Institute for Environmental Design and Engineering
Andrew Mitchell — UCL Bartlett School of Sustainable Construction

Licence

Schemas, code, and documentation: MIT
Data: per-source-dataset licences (see licence_spdx field in each datasets.csv row)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MOSAIQ — Multimodal Open Soundscape AI Quality-benchmark

Repository structure

Quick start

1. Install uv

2. Sync the environment

3. Validate the data package

Derived FeatureRecords

4. Build CLIP visual embedding features (optional)

5. CitySeg semantic summaries (optional)

6. Regenerate data from source

Schema design philosophy

Development

Add a dependency

Run a Python script

Open Jupyter

Citation

Team

Licence

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 27 Commits
.github/workflows		.github/workflows
catalogue		catalogue
config		config
datasets		datasets
docs		docs
examples/harmonised_samples		examples/harmonised_samples
mappings		mappings
notebooks		notebooks
scripts		scripts
shared_schemas		shared_schemas
.gitignore		.gitignore
.python-version		.python-version
README.md		README.md
datacatalog.yaml		datacatalog.yaml
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Folders and files

Latest commit

History

Repository files navigation

MOSAIQ — Multimodal Open Soundscape AI Quality-benchmark

Repository structure

Quick start

1. Install uv

2. Sync the environment

3. Validate the data package

Derived FeatureRecords

4. Build CLIP visual embedding features (optional)

5. CitySeg semantic summaries (optional)

6. Regenerate data from source

Schema design philosophy

Development

Add a dependency

Run a Python script

Open Jupyter

Citation

Team

Licence

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages