Inference-only package for MAEST (Music Audio Efficient Spectrogram Transformer).
This is a lightweight, dependency-minimal repackaging of MAEST focused solely on inference. For training, fine-tuning, and the full research codebase, please visit the original MAEST repository.
# From PyPI
pip install maest-infer
# Or with uv
uv pip install maest-inferFor development:
git clone https://github.com/openmirlab/maest-infer.git
cd maest-infer
pip install -e .import torch
from maest_infer import get_maest
# Load model (downloads pretrained weights automatically)
model = get_maest(arch="discogs-maest-30s-pw-129e-519l")
model.eval()
# Inference with raw 16kHz audio
audio = torch.randn(16000 * 30) # 30 seconds
logits, embeddings = model(audio)
# logits: (1, 519), embeddings: (1, 768)
# Predict with labels
activations, labels = model.predict_labels(audio)| Model | Input Length | Labels | Description |
|---|---|---|---|
discogs-maest-5s-pw-129e |
5 sec | 400 | PaSST weights |
discogs-maest-10s-fs-129e |
10 sec | 400 | From scratch |
discogs-maest-10s-pw-129e |
10 sec | 400 | PaSST weights |
discogs-maest-10s-dw-75e |
10 sec | 400 | DeiT weights |
discogs-maest-20s-pw-129e |
20 sec | 400 | PaSST weights |
discogs-maest-30s-pw-129e |
30 sec | 400 | PaSST weights |
discogs-maest-30s-pw-73e-ts |
30 sec | 400 | Teacher-student |
discogs-maest-30s-pw-129e-519l |
30 sec | 519 | Extended labels |
All model checkpoints are hosted on GitHub Releases and downloaded automatically on first use (cached in ~/.cache/torch/hub/checkpoints/).
This package is licensed under AGPL-3.0-only, following the original MAEST license.
This package is a repackaging of MAEST (Music Audio Efficient Spectrogram Transformer) created by Pablo Alonso-Jimenez and colleagues at the Music Technology Group (MTG), Universitat Pompeu Fabra.
- Original Repository: https://github.com/palonso/maest
- Hugging Face Models: https://huggingface.co/mtg-upf
- Paper: arXiv:2309.16418
We are grateful to the original authors for making their research and pretrained models publicly available.
If you use MAEST in your research, please cite the original paper:
@inproceedings{alonso2023efficient,
title={Efficient Supervised Training of Audio Transformers for Music Representation Learning},
author={Alonso-Jim{\'e}nez, Pablo and Serra, Xavier and Bogdanov, Dmitry},
booktitle={Proceedings of the 24th International Society for Music Information Retrieval Conference (ISMIR)},
year={2023},
}