Skip to content

Inference-only implementation of OpenAI Jukebox for modern PyTorch (2.7+)

License

Notifications You must be signed in to change notification settings

openmirlab/jukebox-infer

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

7 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Jukebox-Infer

Inference-only implementation of OpenAI Jukebox for modern PyTorch (2.7+)

PyPI Python 3.10+ PyTorch License: MIT

High-quality music generation models for creating music from scratch or continuing existing audio tracks.


πŸ“Œ Overview

Jukebox-Infer is a streamlined, inference-only version of OpenAI Jukebox, optimized for PyTorch 2.7+ with minimal dependencies.

Note: This project is based on OpenAI Jukebox. All credit for the original model and research belongs to OpenAI and the Jukebox authors.


πŸŽ‰ What's New

  • v0.1.0 (Latest): Initial release - Clean inference-only implementation extracted from OpenAI Jukebox

✨ Features

  • βœ… 100% Parity Verified - VQ-VAE features identical to original Jukebox (see Parity Verification)
  • βœ… Inference-only - No training code, significantly reduced codebase (~47% reduction)
  • βœ… Modern PyTorch - Compatible with PyTorch 2.7+
  • βœ… Single-GPU - No MPI or distributed dependencies
  • βœ… Minimal dependencies - Removed tensorboardX, apex, and training-specific libs
  • βœ… Auto-download - Automatic checkpoint downloads on first use
  • βœ… GPU acceleration - Full CUDA support with optimized device management
  • βœ… Simple API - High-level Jukebox class for easy music generation
  • βœ… Audio continuation - Support for primed sampling from audio prompts


πŸš€ Quick Start

Installation

From PyPI:

# Using pip
pip install jukebox-infer

# Using uv (recommended - faster)
uv pip install jukebox-infer

# Or add to your project with uv
uv add jukebox-infer

For Development:

# Clone the repository
git clone https://github.com/openmirlab/jukebox-infer.git
cd jukebox-infer

# Install in editable mode
pip install -e .

# Or with uv
uv pip install -e .

Package: https://pypi.org/project/jukebox-infer/

Note: If you're setting up both the original Jukebox and jukebox-infer for comparison testing, see ../JUKEBOX_SETUP.md for detailed environment setup instructions.

Command-Line Interface (Fastest)

# Basic generation (default: 20 seconds, The Beatles, Rock)
python quick_infer.py

# Custom artist and genre
python quick_infer.py --artist "Taylor Swift" --genre "Pop" --duration 30

# Audio continuation from existing audio
python quick_infer.py --prompt input.wav --prompt-duration 5 --duration 20 --output continuation.wav

# See all options
python quick_infer.py --help

Simple API (Recommended for Python)

from jukebox_infer import Jukebox

# Initialize model (checkpoints auto-download on first use)
model = Jukebox(model_name="1b_lyrics", device="cuda")
model.load(sample_length_in_seconds=20)

# Generate music
audio = model.generate(
    artist="The Beatles",
    genre="Rock",
    duration_seconds=20,
    output_path="output.wav"
)

Audio Continuation

CLI:

python quick_infer.py --prompt input.wav --prompt-duration 5 --duration 20 --output continuation.wav

Python API:

from jukebox_infer import Jukebox

model = Jukebox(model_name="1b_lyrics", device="cuda")
model.load(sample_length_in_seconds=20)

# Continue from existing audio
audio = model.generate_from_audio(
    prompt_audio="input.wav",
    prompt_duration=5,  # Use first 5 seconds as prompt
    total_duration=20,  # Generate 20 seconds total
    output_path="continuation.wav"
)


πŸ“¦ Download Checkpoints

Checkpoints are automatically downloaded when you first use a model. No manual download needed!

If you prefer to pre-download checkpoints manually:

# Option 1: Use the download script
bash download_checkpoints.sh

# Option 2: Use Python API
from jukebox_infer import download_checkpoints
download_checkpoints('1b_lyrics')  # Downloads ~6.2GB

Checkpoints are cached in ~/.cache/jukebox/models/:

  • VQ-VAE (7.4MB) - shared encoder/decoder
  • Prior level 0 & 1 (4.4GB) - shared upsamplers
  • Prior level 2 (1.8GB) - 1b_lyrics top-level model

🎡 Available Models

Model Parameters Download Size VRAM Description
1b_lyrics 1B ~6.2GB ~12GB Lyrics conditioning support

πŸ“‹ Requirements

  • Python: β‰₯3.10
  • PyTorch: β‰₯2.7.0
  • GPU: CUDA-capable GPU (16GB+ VRAM recommended for 1b_lyrics)
  • OS: Linux, macOS, Windows


⚑ Performance

Generation is intentionally slow due to autoregressive nature:

  • ~5-15 seconds per second of audio on RTX 4090 (with GPU acceleration)
  • 18 seconds: ~3-5 minutes
  • 60 seconds: ~5-15 minutes

This matches the original implementation's performance characteristics.

Note: Generation speed depends on GPU, model size, and generation length. The autoregressive nature means longer generations take proportionally longer.


πŸ“š Documentation


πŸ—οΈ Project Structure

jukebox-infer/
β”œβ”€β”€ jukebox_infer/      # Main package
β”‚   β”œβ”€β”€ api.py         # High-level Jukebox API
β”‚   β”œβ”€β”€ cli.py         # CLI interface
β”‚   β”œβ”€β”€ make_models.py # Model loading and checkpoint management
β”‚   β”œβ”€β”€ sample.py      # Sampling functions
β”‚   β”œβ”€β”€ prior/         # Prior model implementations
β”‚   β”œβ”€β”€ vqvae/         # VQ-VAE encoder/decoder
β”‚   β”œβ”€β”€ transformer/   # Transformer architecture
β”‚   └── data/         # Data processing utilities
β”œβ”€β”€ docs/              # Documentation
β”‚   β”œβ”€β”€ PARITY_VERIFICATION.md      # βœ… 100% parity proof
β”‚   β”œβ”€β”€ CHECKPOINT_ARCHITECTURE.md
β”‚   └── dev/           # Development guidelines
β”‚       └── PRINCIPLES.md
β”œβ”€β”€ examples/          # Example scripts
β”œβ”€β”€ quick_infer.py     # Quick inference script (standalone)
β”œβ”€β”€ download_checkpoints.sh  # Manual download script
β”œβ”€β”€ pyproject.toml
β”œβ”€β”€ LICENSE
└── README.md


βœ… Parity Verification

jukebox-infer has been rigorously verified to produce 100% identical VQ-VAE features compared to the original OpenAI Jukebox.

Test Results

Metric Result
max |Ξ”| 0.000000e+00
mean |Ξ”| 0.000000e+00
Feature shape (1, 6146) - identical
Feature range [8, 2035] - identical
Parity status βœ… 100% VERIFIED

What This Means

  • βœ… Perfect numerical match - Zero difference in VQ-VAE feature extraction
  • βœ… Drop-in replacement - Can completely replace original Jukebox for feature extraction
  • βœ… No accuracy loss - Maintains 100% fidelity to original implementation
  • βœ… Research confidence - Validated for academic and production use

Testing Methodology

Parity was verified using:

  • Multiple audio durations (5s, 20s)
  • Identical official OpenAI checkpoints
  • Rigorous numerical comparison (rtol=1e-4, atol=1e-6)
  • Both CPU and GPU modes tested

For full details, see PARITY_VERIFICATION.md


πŸ™ Acknowledgments

Original Research by OpenAI

Jukebox-Infer is built upon the groundbreaking work of OpenAI Jukebox. The original Jukebox represents a major advancement in music generation, achieving state-of-the-art results through innovative hierarchical VQ-VAE and transformer architectures.

Research Paper

Jukebox: A Generative Model for Music

This seminal work introduced hierarchical music generation with conditioning on artist, genre, and lyrics, enabling high-quality music generation at multiple time scales.

Original Authors

  • Prafulla Dhariwal
  • Heewoo Jun
  • Christine Payne
  • Jong Wook Kim
  • Alec Radford
  • Ilya Sutskever

About This Implementation

Note: The original Jukebox repository is no longer actively maintained. This package was created to continue the excellent work by providing ongoing maintenance and PyTorch 2.7+ compatibility for the inference capabilities, while preserving 100% of the original model quality and algorithms.

What we maintain:

  • PyTorch 2.7+ compatibility
  • Modern dependency management
  • Inference-only packaging
  • GPU optimization

What remains unchanged:

  • All model architectures (100% original)
  • All generation algorithms (100% original)
  • All model weights (100% original)
  • VQ-VAE feature extraction (βœ… 100% parity verified - see PARITY_VERIFICATION.md)

πŸ“„ Citation

Please cite using the following bibtex entry:

@article{dhariwal2020jukebox,
  title={Jukebox: A Generative Model for Music},
  author={Dhariwal, Prafulla and Jun, Heewoo and Payne, Christine and Kim, Jong Wook and Radford, Alec and Sutskever, Ilya},
  journal={arXiv preprint arXiv:2005.00341},
  year={2020}
}

If you use Jukebox-Infer in your research, please cite the original Jukebox paper above. This package is merely a maintenance fork to ensure continued compatibility with modern PyTorch versions - all credit for the models, algorithms, and research belongs to the original authors.


πŸ“„ License

MIT License (same as original Jukebox)

Copyright (c) 2020 OpenAI (Original Jukebox) Copyright (c) 2025 (Jukebox-Infer modifications)

See LICENSE for details.

This project includes code adapted from OpenAI Jukebox (MIT License, Copyright 2020 OpenAI).


⚠️ Limitations

  • Inference only - No training capabilities
  • Single GPU - No distributed inference
  • Slow generation - Autoregressive model, ~5-15 seconds per second of audio
  • Minimum duration - 1b_lyrics requires 17.84-600 seconds
  • Large checkpoints - ~6.2GB download required

🀝 Contributing

We welcome contributions! Please:

  1. Read docs/dev/PRINCIPLES.md for development guidelines
  2. Follow the code style (ruff/black)
  3. Add tests for new features
  4. Update documentation
  5. Submit PRs with clear descriptions

Development Setup

# Install dependencies with UV
uv sync

# Run quick inference script
uv run python quick_infer.py

# Format and lint code
uv run ruff format . && uv run ruff check .

See docs/dev/PRINCIPLES.md for detailed development guidelines.


πŸ“ž Support

For issues and questions:


Made with ❀️ for the ML community

Based on the excellent work by OpenAI and the Jukebox authors.

About

Inference-only implementation of OpenAI Jukebox for modern PyTorch (2.7+)

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Contributors 2

  •  
  •