Skip to content

Releases: openmirlab/jukebox-infer

v0.1.0 - Initial Release

25 Nov 05:34

Choose a tag to compare

jukebox-infer v0.1.0 - Initial Release

Inference-only implementation of OpenAI Jukebox for modern PyTorch (2.7+)

🎉 Highlights

  • 100% Parity Verified - VQ-VAE features identical to original Jukebox (see verification)
  • Modern PyTorch - Compatible with PyTorch 2.7+
  • No MPI/NCCL - Single-GPU inference, no distributed dependencies
  • Minimal codebase - ~47% reduction from original (inference-only)
  • Auto-download - Checkpoints automatically download on first use

📦 Installation

pip install jukebox-infer

🚀 Quick Start

CLI (Fastest)

python -m jukebox_infer --artist "The Beatles" --genre "Rock" --duration 20

Python API

from jukebox_infer import Jukebox

model = Jukebox(model_name="1b_lyrics", device="cuda")
model.load(sample_length_in_seconds=20)

audio = model.generate(
    artist="The Beatles",
    genre="Rock",
    duration_seconds=20,
    output_path="output.wav"
)

✨ Features

  • VQ-VAE feature extraction
  • Music generation from scratch
  • Audio continuation (primed sampling)
  • Artist and genre conditioning
  • Simple high-level API
  • GPU acceleration
  • CPU mode support

📊 Parity Verification

Rigorous testing confirms 100% numerical parity with original Jukebox:

Metric Result
max |Δ| 0.000000e+00
mean |Δ| 0.000000e+00
Feature shape (1, 6146) - identical
Feature range [8, 2035] - identical
Status 100% VERIFIED

Full verification: PARITY_VERIFICATION.md

📋 Requirements

  • Python ≥3.10
  • PyTorch ≥2.7.0
  • CUDA-capable GPU (16GB+ VRAM recommended)
  • ~6.2GB checkpoint download

📚 Documentation

🙏 Acknowledgments

Based on OpenAI Jukebox - all credit for the original model and research belongs to OpenAI and the Jukebox authors.

Citation:

@article{dhariwal2020jukebox,
  title={Jukebox: A Generative Model for Music},
  author={Dhariwal, Prafulla and Jun, Heewoo and Payne, Christine and Kim, Jong Wook and Radford, Alec and Sutskever, Ilya},
  journal={arXiv preprint arXiv:2005.00341},
  year={2020}
}

⚠️ Known Limitations

  • Inference only (no training)
  • Single GPU (no distributed inference)
  • Slow generation (~5-15 sec/sec of audio)
  • Minimum duration: 17.84 seconds
  • Large checkpoints (~6.2GB)

Package: https://pypi.org/project/jukebox-infer/
Repository: https://github.com/openmirlab/jukebox-infer