Releases: openmirlab/jukebox-infer
Releases · openmirlab/jukebox-infer
v0.1.0 - Initial Release
jukebox-infer v0.1.0 - Initial Release
Inference-only implementation of OpenAI Jukebox for modern PyTorch (2.7+)
🎉 Highlights
- ✅ 100% Parity Verified - VQ-VAE features identical to original Jukebox (see verification)
- ✅ Modern PyTorch - Compatible with PyTorch 2.7+
- ✅ No MPI/NCCL - Single-GPU inference, no distributed dependencies
- ✅ Minimal codebase - ~47% reduction from original (inference-only)
- ✅ Auto-download - Checkpoints automatically download on first use
📦 Installation
pip install jukebox-infer🚀 Quick Start
CLI (Fastest)
python -m jukebox_infer --artist "The Beatles" --genre "Rock" --duration 20Python API
from jukebox_infer import Jukebox
model = Jukebox(model_name="1b_lyrics", device="cuda")
model.load(sample_length_in_seconds=20)
audio = model.generate(
artist="The Beatles",
genre="Rock",
duration_seconds=20,
output_path="output.wav"
)✨ Features
- VQ-VAE feature extraction
- Music generation from scratch
- Audio continuation (primed sampling)
- Artist and genre conditioning
- Simple high-level API
- GPU acceleration
- CPU mode support
📊 Parity Verification
Rigorous testing confirms 100% numerical parity with original Jukebox:
| Metric | Result |
|---|---|
| max |Δ| | 0.000000e+00 |
| mean |Δ| | 0.000000e+00 |
| Feature shape | (1, 6146) - identical |
| Feature range | [8, 2035] - identical |
| Status | ✅ 100% VERIFIED |
Full verification: PARITY_VERIFICATION.md
📋 Requirements
- Python ≥3.10
- PyTorch ≥2.7.0
- CUDA-capable GPU (16GB+ VRAM recommended)
- ~6.2GB checkpoint download
📚 Documentation
- README - Quick start and usage
- PARITY_VERIFICATION.md - Verification details
- CHECKPOINT_ARCHITECTURE.md - Checkpoint structure
🙏 Acknowledgments
Based on OpenAI Jukebox - all credit for the original model and research belongs to OpenAI and the Jukebox authors.
Citation:
@article{dhariwal2020jukebox,
title={Jukebox: A Generative Model for Music},
author={Dhariwal, Prafulla and Jun, Heewoo and Payne, Christine and Kim, Jong Wook and Radford, Alec and Sutskever, Ilya},
journal={arXiv preprint arXiv:2005.00341},
year={2020}
}⚠️ Known Limitations
- Inference only (no training)
- Single GPU (no distributed inference)
- Slow generation (~5-15 sec/sec of audio)
- Minimum duration: 17.84 seconds
- Large checkpoints (~6.2GB)
Package: https://pypi.org/project/jukebox-infer/
Repository: https://github.com/openmirlab/jukebox-infer