Skip to content

This is a lightweight, dependency-minimal repackaging of MAEST focused solely on inference.

License

Notifications You must be signed in to change notification settings

openmirlab/maest-infer

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

maest-infer

PyPI Python 3.10+ License: AGPL-3.0

Inference-only package for MAEST (Music Audio Efficient Spectrogram Transformer).

This is a lightweight, dependency-minimal repackaging of MAEST focused solely on inference. For training, fine-tuning, and the full research codebase, please visit the original MAEST repository.


Installation

# From PyPI
pip install maest-infer

# Or with uv
uv pip install maest-infer

For development:

git clone https://github.com/openmirlab/maest-infer.git
cd maest-infer
pip install -e .

Usage

import torch
from maest_infer import get_maest

# Load model (downloads pretrained weights automatically)
model = get_maest(arch="discogs-maest-30s-pw-129e-519l")
model.eval()

# Inference with raw 16kHz audio
audio = torch.randn(16000 * 30)  # 30 seconds
logits, embeddings = model(audio)
# logits: (1, 519), embeddings: (1, 768)

# Predict with labels
activations, labels = model.predict_labels(audio)

Available Models

Model Input Length Labels Description
discogs-maest-5s-pw-129e 5 sec 400 PaSST weights
discogs-maest-10s-fs-129e 10 sec 400 From scratch
discogs-maest-10s-pw-129e 10 sec 400 PaSST weights
discogs-maest-10s-dw-75e 10 sec 400 DeiT weights
discogs-maest-20s-pw-129e 20 sec 400 PaSST weights
discogs-maest-30s-pw-129e 30 sec 400 PaSST weights
discogs-maest-30s-pw-73e-ts 30 sec 400 Teacher-student
discogs-maest-30s-pw-129e-519l 30 sec 519 Extended labels

All model checkpoints are hosted on GitHub Releases and downloaded automatically on first use (cached in ~/.cache/torch/hub/checkpoints/).


License

This package is licensed under AGPL-3.0-only, following the original MAEST license.


Credits & Acknowledgments

This package is a repackaging of MAEST (Music Audio Efficient Spectrogram Transformer) created by Pablo Alonso-Jimenez and colleagues at the Music Technology Group (MTG), Universitat Pompeu Fabra.

We are grateful to the original authors for making their research and pretrained models publicly available.


Citation

If you use MAEST in your research, please cite the original paper:

@inproceedings{alonso2023efficient,
    title={Efficient Supervised Training of Audio Transformers for Music Representation Learning},
    author={Alonso-Jim{\'e}nez, Pablo and Serra, Xavier and Bogdanov, Dmitry},
    booktitle={Proceedings of the 24th International Society for Music Information Retrieval Conference (ISMIR)},
    year={2023},
}

Related Projects

  • MAEST - Original research repository with training code
  • PaSST - Patchout faSt Spectrogram Transformer (base architecture)
  • Essentia - MAEST models in Essentia

About

This is a lightweight, dependency-minimal repackaging of MAEST focused solely on inference.

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Contributors 2

  •  
  •  

Languages