maest-infer

Inference-only package for MAEST (Music Audio Efficient Spectrogram Transformer).

This is a lightweight, dependency-minimal repackaging of MAEST focused solely on inference. For training, fine-tuning, and the full research codebase, please visit the original MAEST repository.

Installation

# From PyPI
pip install maest-infer

# Or with uv
uv pip install maest-infer

For development:

git clone https://github.com/openmirlab/maest-infer.git
cd maest-infer
pip install -e .

Usage

import torch
from maest_infer import get_maest

# Load model (downloads pretrained weights automatically)
model = get_maest(arch="discogs-maest-30s-pw-129e-519l")
model.eval()

# Inference with raw 16kHz audio
audio = torch.randn(16000 * 30)  # 30 seconds
logits, embeddings = model(audio)
# logits: (1, 519), embeddings: (1, 768)

# Predict with labels
activations, labels = model.predict_labels(audio)

Available Models

Model	Input Length	Labels	Description
`discogs-maest-5s-pw-129e`	5 sec	400	PaSST weights
`discogs-maest-10s-fs-129e`	10 sec	400	From scratch
`discogs-maest-10s-pw-129e`	10 sec	400	PaSST weights
`discogs-maest-10s-dw-75e`	10 sec	400	DeiT weights
`discogs-maest-20s-pw-129e`	20 sec	400	PaSST weights
`discogs-maest-30s-pw-129e`	30 sec	400	PaSST weights
`discogs-maest-30s-pw-73e-ts`	30 sec	400	Teacher-student
`discogs-maest-30s-pw-129e-519l`	30 sec	519	Extended labels

All model checkpoints are hosted on GitHub Releases and downloaded automatically on first use (cached in ~/.cache/torch/hub/checkpoints/).

License

This package is licensed under AGPL-3.0-only, following the original MAEST license.

Credits & Acknowledgments

This package is a repackaging of MAEST (Music Audio Efficient Spectrogram Transformer) created by Pablo Alonso-Jimenez and colleagues at the Music Technology Group (MTG), Universitat Pompeu Fabra.

Original Repository: https://github.com/palonso/maest
Hugging Face Models: https://huggingface.co/mtg-upf
Paper: arXiv:2309.16418

We are grateful to the original authors for making their research and pretrained models publicly available.

Citation

If you use MAEST in your research, please cite the original paper:

@inproceedings{alonso2023efficient,
    title={Efficient Supervised Training of Audio Transformers for Music Representation Learning},
    author={Alonso-Jim{\'e}nez, Pablo and Serra, Xavier and Bogdanov, Dmitry},
    booktitle={Proceedings of the 24th International Society for Music Information Retrieval Conference (ISMIR)},
    year={2023},
}

Related Projects

MAEST - Original research repository with training code
PaSST - Patchout faSt Spectrogram Transformer (base architecture)
Essentia - MAEST models in Essentia

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
.github/workflows		.github/workflows
src/maest_infer		src/maest_infer
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

maest-infer

Installation

Usage

Available Models

License

Credits & Acknowledgments

Citation

Related Projects

About

Uh oh!

Releases 1

Packages

Contributors 2

Uh oh!

Languages

License

openmirlab/maest-infer

Folders and files

Latest commit

History

Repository files navigation

maest-infer

Installation

Usage

Available Models

License

Credits & Acknowledgments

Citation

Related Projects

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Contributors 2

Uh oh!

Languages

Packages