A professional, Suno-like music generation studio for HeartLib
Features • Demo • Installation • Usage • Configuration • Credits
| Feature | Description |
|---|---|
| Full Song Generation | Create complete songs with vocals and lyrics up to 4+ minutes |
| Instrumental Mode | Generate instrumental tracks without vocals |
| Style Tags | Define genre, mood, tempo, and instrumentation |
| Seed Control | Reproduce exact generations for consistency |
| Queue System | Queue multiple generations and process them sequentially |
| Feature | Description |
|---|---|
| Audio Upload | Use any audio file as a style reference |
| Waveform Visualization | Professional waveform display powered by WaveSurfer.js |
| Region Selection | Draggable 10-second region selector for precise style sampling |
| Style Influence | Adjustable slider to control reference audio influence (1-100%) |
| Synced Playback | Modal waveform syncs with bottom player in real-time |
Coming Soon: LoRA Voice Training - We're actively developing LoRA-based voice training with exceptional results. Our early tests show voice consistency that surpasses Suno. Stay tuned for updates!
| Feature | Description |
|---|---|
| Lyrics Generation | Generate lyrics from a topic using LLMs |
| Multiple Providers | Support for Ollama (local) and OpenRouter (cloud) |
| Style Suggestions | AI-suggested style tags based on your concept |
| Prompt Enhancement | Improve your prompts with AI assistance |
| Feature | Description |
|---|---|
| Spotify-Inspired UI | Clean, modern design with dark/light mode |
| Bottom Player | Full-featured player with waveform, volume, and progress |
| History Feed | Browse, search, and manage all generated tracks |
| Likes & Playlists | Organize favorites into custom playlists |
| Real-time Progress | Live generation progress with step indicators |
| Responsive Design | Works on desktop and mobile devices |
| Layer | Technologies |
|---|---|
| Frontend | React 18, TypeScript, TailwindCSS, Framer Motion, WaveSurfer.js |
| Backend | FastAPI, SQLModel, SSE (Server-Sent Events) |
| AI Engine | HeartLib - MuQ, MuLan, HeartCodec |
| LLM Integration | Ollama, OpenRouter |
HeartMuLa Studio uses mmgp (Memory Management for GPU Poor) for intelligent GPU memory management, enabling reliable generation on GPUs with 10GB+ VRAM.
mmgp provides lazy model loading with automatic memory swapping between the transformer and codec models. This is more reliable than traditional quantization methods, especially on newer GPUs.
| Feature | Description |
|---|---|
| bf16 Precision | Default mode, fastest performance |
| Model Swapping | Automatically swaps models when VRAM is limited |
| INT8 Quantization | Optional, ~12% slower but lower peak VRAM |
Performance on RTX 3060 12GB (bf16 + Model Swap):
| Duration | Generation Time | Real-Time Factor |
|---|---|---|
| 30 seconds | ~48 seconds | 1.60x |
| 60 seconds | ~80 seconds | 1.33x |
Automatically configured based on your GPU:
| GPU | Flash Attention |
|---|---|
| NVIDIA SM 7.0+ (Volta, Turing, Ampere, Ada, Hopper) | ✅ Enabled |
| NVIDIA SM 6.x and older (Pascal, Maxwell) | ❌ Disabled (uses math backend) |
| AMD GPUs | ❌ Disabled (compatibility varies) |
Enable PyTorch 2.0+ compilation for ~2x faster inference on supported GPUs:
# Enable torch.compile
HEARTMULA_COMPILE=true python -m uvicorn backend.app.main:app --host 0.0.0.0 --port 8000
# With max performance (slower first run, faster subsequent runs)
HEARTMULA_COMPILE=true HEARTMULA_COMPILE_MODE=max-autotune python -m uvicorn backend.app.main:app --host 0.0.0.0 --port 8000| Mode | Description |
|---|---|
default |
Good balance of compile time and performance |
reduce-overhead |
Faster compilation, slightly less optimal code |
max-autotune |
Best performance, but slowest compilation (recommended for production) |
Requirements:
- PyTorch 2.0+
- Linux/WSL2: Install Triton (
pip install triton) - Windows: Install Triton-Windows (
pip install -U 'triton-windows>=3.2,<3.3')
Note: First generation will be slower due to compilation. Subsequent generations benefit from the compiled kernels.
Automatically selects the best GPU configuration:
- Single GPU: Uses mmgp with model swapping based on available VRAM
- Multi-GPU: HeartMuLa → fastest GPU (Flash Attention), HeartCodec → largest VRAM GPU
Models are automatically downloaded from HuggingFace Hub on first run (~5GB):
- HeartMuLa (main model)
- HeartCodec (audio decoder)
- Tokenizer and generation config
./start.shThat's it! The system auto-detects your GPU and downloads models on first run.
Click the gear icon (⚙️) in the header to open the Settings Modal. This provides a user-friendly way to configure GPU settings without editing files.
The Settings Modal displays your detected GPU(s) with:
- GPU name and VRAM
- Flash Attention support status
- Current configuration mode
| Setting | Options | Description |
|---|---|---|
| INT8 Quantization | On/Off | Enable mmgp INT8 quantization (slower but lower peak VRAM) |
| Memory Swap Mode | Auto/On/Off | Model swapping for limited VRAM GPUs |
| torch.compile | On/Off | PyTorch compilation for faster inference |
| GPU | VRAM | Recommended Settings |
|---|---|---|
| RTX 4090 | 24GB | All defaults (Full Precision) |
| RTX 3090 | 24GB | All defaults (Full Precision) |
| RTX 4070 Ti | 16GB | All defaults (mmgp bf16) |
| RTX 3060 | 12GB | Memory Swap: On, INT8: Off (bf16 is faster) |
| RTX 4060 | 8GB | Memory Swap: On, may need INT8: On |
- Adjust settings in the modal
- Click "Apply & Reload Models"
- Wait for model reload (~30-60 seconds)
- New settings take effect for all future generations
Note: Settings are saved to
backend/settings.jsonand persist across restarts.
The easiest way to run HeartMuLa Studio - no Python/Node setup required.
- Docker with NVIDIA Container Toolkit
- NVIDIA GPU with 10GB+ VRAM (12GB recommended for reliable generation)
# Clone and start (uses pre-built image from GitHub Container Registry)
git clone https://github.com/fspecii/HeartMuLa-Studio.git
cd HeartMuLa-Studio
docker compose up -d
# View logs (watch model download progress on first run)
docker compose logs -f# Create directories for persistent data
mkdir -p backend/models backend/generated_audio backend/ref_audio
# Run the pre-built image (Docker Hub)
docker run -d \
--gpus all \
-p 8000:8000 \
-v ./backend/models:/app/backend/models \
-v ./backend/generated_audio:/app/backend/generated_audio \
-v ./backend/ref_audio:/app/backend/ref_audio \
--name heartmula-studio \
ambsd/heartmula-studio:latestAvailable registries:
- Docker Hub:
ambsd/heartmula-studio:latest - GitHub:
ghcr.io/fspecii/heartmula-studio:latest
- Docker builds the image (~10GB, includes CUDA + PyTorch)
- Models are automatically downloaded from HuggingFace (~5GB)
- Container starts with GPU auto-detection
- Frontend + API served on port 8000
All your data is preserved across container restarts:
| Data | Location | Description |
|---|---|---|
| Generated Music | ./backend/generated_audio/ |
Your MP3 files (accessible from host) |
| Models | ./backend/models/ |
Downloaded AI models (~5GB) |
| Reference Audio | ./backend/ref_audio/ |
Uploaded style references |
| Song History | Docker volume heartmula-db |
Database with all your generations |
# Start
docker compose up -d
# Stop
docker compose down
# View logs
docker compose logs -f
# Rebuild after updates
docker compose build --no-cache
docker compose up -d
# Reset database (fresh start)
docker compose down -v
docker compose up -dOverride settings in docker-compose.yml:
environment:
- HEARTMULA_SEQUENTIAL_OFFLOAD=true # Force Memory Swap Mode (for limited VRAM)
- HEARTMULA_COMPILE=true # Enable torch.compile
volumes:
# Use existing models from another location (e.g., ComfyUI)
- /path/to/comfyui/models/heartmula:/app/backend/modelsTip: Use the Settings Modal in the UI to change settings instead of editing docker-compose.yml. Settings persist in
backend/settings.json.
To use Ollama (running on host) for AI lyrics generation:
- Ollama is auto-configured - The container uses
host.docker.internalto reach Ollama on your host machine - Just run Ollama normally on your host (not in Docker)
- The container will automatically connect to
http://host.docker.internal:11434
Custom Ollama URL:
environment:
- OLLAMA_HOST=http://your-ollama-server:11434- Python 3.10 or higher
- Node.js 18 or higher
- NVIDIA GPU with 10GB+ VRAM (12GB+ recommended)
- Git for cloning the repository
git clone https://github.com/fspecii/HeartMuLa-Studio.git
cd HeartMuLa-Studio# Create virtual environment in root folder
python -m venv venv
source venv/bin/activate # Windows: venv\Scripts\activate
# Install backend dependencies
pip install -r backend/requirements.txtNote: HeartLib models (~5GB) will be downloaded automatically from HuggingFace on first run.
cd frontend
# Install dependencies
npm install
# Build for production
npm run buildsource venv/bin/activate # Windows: venv\Scripts\activate
# Single GPU
python -m uvicorn backend.app.main:app --host 0.0.0.0 --port 8000
# Multi-GPU (recommended for 2+ GPUs)
CUDA_VISIBLE_DEVICES=0,1 python -m uvicorn backend.app.main:app --host 0.0.0.0 --port 8000Development mode:
cd frontend
npm run devProduction mode:
# Serve the dist folder with any static server
npx serve dist -l 5173| Mode | URL |
|---|---|
| Development | http://localhost:5173 |
| Production | http://localhost:8000 |
Create a .env file in the backend directory:
# OpenRouter API (for cloud LLM)
OPENROUTER_API_KEY=your_api_key_here
# Ollama (for local LLM)
OLLAMA_HOST=http://localhost:11434HeartMuLa Configuration (set when running):
| Variable | Default | Description |
|---|---|---|
HEARTMULA_MODEL_DIR |
backend/models |
Custom model directory (share with ComfyUI, etc.) |
HEARTMULA_SEQUENTIAL_OFFLOAD |
auto |
Memory Swap Mode: auto, true, or false |
HEARTMULA_COMPILE |
false |
torch.compile for ~2x faster inference: true or false |
HEARTMULA_COMPILE_MODE |
default |
Compile mode: default, reduce-overhead, or max-autotune |
HEARTMULA_VERSION |
RL-3B-20260123 |
Model version (latest RL-tuned model) |
CUDA_VISIBLE_DEVICES |
all GPUs | Specify which GPUs to use (e.g., 0,1) |
Tip: Most settings can now be changed via the Settings Modal in the UI without restarting.
Example: Use existing models from ComfyUI:
HEARTMULA_MODEL_DIR=/path/to/comfyui/models/heartmula ./start.shHeartMuLa Studio automatically detects your available VRAM (not just total, accounting for other apps) and selects the optimal configuration:
| Available VRAM | Auto-Selected Mode | Performance | Example GPUs |
|---|---|---|---|
| 20GB+ | Full Precision | Fastest | RTX 4090, RTX 3090 Ti, A6000 |
| 14-20GB | mmgp bf16 | Fast | RTX 4060 Ti 16GB, RTX 3090 |
| 10-14GB | mmgp bf16 + Model Swap | ~1.3-1.6x RTF | RTX 3060 12GB, RTX 4080 12GB |
| 8-10GB | mmgp bf16 + Model Swap (Low VRAM) | May have issues | RTX 3070, RTX 4060 8GB |
| <8GB | Not supported | - | Insufficient VRAM |
RTF = Real-Time Factor (e.g., 1.3x means 60s of audio takes ~80s to generate)
Multi-GPU: Automatically detected. HeartMuLa goes to fastest GPU (Flash Attention), HeartCodec to largest VRAM GPU.
./start.sh # Auto-detect (recommended)
./start.sh --force-swap # Force Memory Swap Mode (low VRAM mode)
./start.sh --help # Show all optionsOverride auto-detection with environment variables:
# Force Memory Swap Mode
HEARTMULA_SEQUENTIAL_OFFLOAD=true ./start.sh
# Or run directly
python -m uvicorn backend.app.main:app --host 0.0.0.0 --port 8000| Variable | Values | Description |
|---|---|---|
HEARTMULA_SEQUENTIAL_OFFLOAD |
auto, true, false |
Memory Swap Mode (default: auto) |
CUDA_VISIBLE_DEVICES |
0, 0,1, etc. |
Select specific GPUs |
Memory Optimization:
PYTORCH_CUDA_ALLOC_CONF=expandable_segments:TrueRecommended: Use the Settings Modal in the UI for most configuration changes. It provides a visual interface and saves settings persistently.
For AI-powered lyrics generation:
Option A: Ollama (Local)
# Install Ollama
curl -fsSL https://ollama.com/install.sh | sh
# Pull a model
ollama pull llama3.2Option B: OpenRouter (Cloud)
- Get an API key from OpenRouter
- Add it to your
.envfile
HeartMuLa-Studio/
├── backend/
│ ├── app/
│ │ ├── main.py # FastAPI application & routes
│ │ ├── models.py # Pydantic/SQLModel schemas
│ │ └── services/
│ │ ├── music_service.py # HeartLib integration
│ │ └── llm_service.py # LLM providers
│ ├── generated_audio/ # Output MP3 files
│ ├── ref_audio/ # Uploaded reference audio
│ ├── jobs.db # SQLite database
│ └── requirements.txt
├── frontend/
│ ├── src/
│ │ ├── components/
│ │ │ ├── ComposerSidebar.tsx # Main generation form
│ │ │ ├── BottomPlayer.tsx # Audio player
│ │ │ ├── RefAudioRegionModal.tsx # Waveform selector
│ │ │ ├── HistoryFeed.tsx # Track history
│ │ │ └── ...
│ │ ├── App.tsx # Main application
│ │ └── api.ts # Backend API client
│ ├── public/
│ └── package.json
├── preview.gif
└── README.md
| Method | Endpoint | Description |
|---|---|---|
POST |
/generate/music |
Start music generation |
POST |
/generate/lyrics |
Generate lyrics with LLM |
POST |
/upload/ref_audio |
Upload reference audio |
GET |
/history |
Get generation history |
GET |
/jobs/{id} |
Get job status |
GET |
/events |
SSE stream for real-time updates |
GET |
/audio/{path} |
Stream generated audio |
| Issue | Solution |
|---|---|
| CUDA out of memory | Enable Memory Swap Mode in Settings Modal, or try ./start.sh --force-swap |
| CUDA device-side assert | Update to latest version (fixed in mmgp integration) |
| Models not downloading | Check internet connection and disk space (~5GB needed in backend/models/) |
| Frontend can't connect | Ensure backend is running on port 8000 |
| LLM not working | Check Ollama is running or OpenRouter API key is set in backend/.env |
| Only one GPU detected | Set CUDA_VISIBLE_DEVICES=0,1 explicitly when starting backend |
| Slow generation | Check Settings Modal for GPU config; bf16 is faster than INT8 |
| Settings not saving | Check write permissions on backend/settings.json |
Models are auto-downloaded to backend/models/ (~5GB total):
backend/models/
├── HeartMuLa-oss-RL-3B-20260123/ # Main model
├── HeartCodec-oss/ # Audio codec
├── tokenizer.json
└── gen_config.json
- HeartMuLa/heartlib - The open-source AI music generation engine
- mainza-ai/milimomusic - Inspiration for the backend architecture
- WaveSurfer.js - Audio waveform visualization
This project is open source under the MIT License.
Contributions are welcome! Please feel free to:
- Fork the repository
- Create a feature branch (
git checkout -b feature/amazing-feature) - Commit your changes (
git commit -m 'Add amazing feature') - Push to the branch (
git push origin feature/amazing-feature) - Open a Pull Request
Made with ❤️ for the open-source AI music community
