HeartMuLa Studio

A professional, Suno-like music generation studio for HeartLib

Features • Demo • Installation • Usage • Configuration • Credits

Demo

Features

🎵 AI Music Generation

Feature	Description
Full Song Generation	Create complete songs with vocals and lyrics up to 4+ minutes
Instrumental Mode	Generate instrumental tracks without vocals
Style Tags	Define genre, mood, tempo, and instrumentation
Seed Control	Reproduce exact generations for consistency
Queue System	Queue multiple generations and process them sequentially

🎨 Reference Audio (Style Transfer) `Experimental`

Feature	Description
Audio Upload	Use any audio file as a style reference
Waveform Visualization	Professional waveform display powered by WaveSurfer.js
Region Selection	Draggable 10-second region selector for precise style sampling
Style Influence	Adjustable slider to control reference audio influence (1-100%)
Synced Playback	Modal waveform syncs with bottom player in real-time

Coming Soon: LoRA Voice Training - We're actively developing LoRA-based voice training with exceptional results. Our early tests show voice consistency that surpasses Suno. Stay tuned for updates!

🎤 AI-Powered Lyrics

Feature	Description
Lyrics Generation	Generate lyrics from a topic using LLMs
Multiple Providers	Support for Ollama (local) and OpenRouter (cloud)
Style Suggestions	AI-suggested style tags based on your concept
Prompt Enhancement	Improve your prompts with AI assistance

🎧 Professional Interface

Feature	Description
Spotify-Inspired UI	Clean, modern design with dark/light mode
Bottom Player	Full-featured player with waveform, volume, and progress
History Feed	Browse, search, and manage all generated tracks
Likes & Playlists	Organize favorites into custom playlists
Real-time Progress	Live generation progress with step indicators
Responsive Design	Works on desktop and mobile devices

Tech Stack

Layer	Technologies
Frontend	React 18, TypeScript, TailwindCSS, Framer Motion, WaveSurfer.js
Backend	FastAPI, SQLModel, SSE (Server-Sent Events)
AI Engine	HeartLib - MuQ, MuLan, HeartCodec
LLM Integration	Ollama, OpenRouter

Performance Optimizations

HeartMuLa Studio uses mmgp (Memory Management for GPU Poor) for intelligent GPU memory management, enabling reliable generation on GPUs with 10GB+ VRAM.

🧠 mmgp Memory Management

mmgp provides lazy model loading with automatic memory swapping between the transformer and codec models. This is more reliable than traditional quantization methods, especially on newer GPUs.

Feature	Description
bf16 Precision	Default mode, fastest performance
Model Swapping	Automatically swaps models when VRAM is limited
INT8 Quantization	Optional, ~12% slower but lower peak VRAM

Performance on RTX 3060 12GB (bf16 + Model Swap):

Duration	Generation Time	Real-Time Factor
30 seconds	~48 seconds	1.60x
60 seconds	~80 seconds	1.33x

⚡ Flash Attention

Automatically configured based on your GPU:

GPU	Flash Attention
NVIDIA SM 7.0+ (Volta, Turing, Ampere, Ada, Hopper)	✅ Enabled
NVIDIA SM 6.x and older (Pascal, Maxwell)	❌ Disabled (uses math backend)
AMD GPUs	❌ Disabled (compatibility varies)

🔥 torch.compile (Experimental)

Enable PyTorch 2.0+ compilation for ~2x faster inference on supported GPUs:

# Enable torch.compile
HEARTMULA_COMPILE=true python -m uvicorn backend.app.main:app --host 0.0.0.0 --port 8000

# With max performance (slower first run, faster subsequent runs)
HEARTMULA_COMPILE=true HEARTMULA_COMPILE_MODE=max-autotune python -m uvicorn backend.app.main:app --host 0.0.0.0 --port 8000

Mode	Description
`default`	Good balance of compile time and performance
`reduce-overhead`	Faster compilation, slightly less optimal code
`max-autotune`	Best performance, but slowest compilation (recommended for production)

Requirements:

PyTorch 2.0+
Linux/WSL2: Install Triton (pip install triton)
Windows: Install Triton-Windows (pip install -U 'triton-windows>=3.2,<3.3')

Note: First generation will be slower due to compilation. Subsequent generations benefit from the compiled kernels.

🎯 Smart Multi-GPU Detection

Automatically selects the best GPU configuration:

Single GPU: Uses mmgp with model swapping based on available VRAM
Multi-GPU: HeartMuLa → fastest GPU (Flash Attention), HeartCodec → largest VRAM GPU

📥 Auto-Download

Models are automatically downloaded from HuggingFace Hub on first run (~5GB):

HeartMuLa (main model)
HeartCodec (audio decoder)
Tokenizer and generation config

Quick Start

./start.sh

That's it! The system auto-detects your GPU and downloads models on first run.

Open http://localhost:5173

Settings Modal

Click the gear icon (⚙️) in the header to open the Settings Modal. This provides a user-friendly way to configure GPU settings without editing files.

GPU Hardware Info

The Settings Modal displays your detected GPU(s) with:

GPU name and VRAM
Flash Attention support status
Current configuration mode

Configuration Options

Setting	Options	Description
INT8 Quantization	On/Off	Enable mmgp INT8 quantization (slower but lower peak VRAM)
Memory Swap Mode	Auto/On/Off	Model swapping for limited VRAM GPUs
torch.compile	On/Off	PyTorch compilation for faster inference

Recommended Settings by GPU

GPU	VRAM	Recommended Settings
RTX 4090	24GB	All defaults (Full Precision)
RTX 3090	24GB	All defaults (Full Precision)
RTX 4070 Ti	16GB	All defaults (mmgp bf16)
RTX 3060	12GB	Memory Swap: On, INT8: Off (bf16 is faster)
RTX 4060	8GB	Memory Swap: On, may need INT8: On

Applying Changes

Adjust settings in the modal
Click "Apply & Reload Models"
Wait for model reload (~30-60 seconds)
New settings take effect for all future generations

Note: Settings are saved to backend/settings.json and persist across restarts.

Docker (Recommended)

The easiest way to run HeartMuLa Studio - no Python/Node setup required.

Prerequisites

Docker with NVIDIA Container Toolkit
NVIDIA GPU with 10GB+ VRAM (12GB recommended for reliable generation)

Quick Start with Docker

# Clone and start (uses pre-built image from GitHub Container Registry)
git clone https://github.com/fspecii/HeartMuLa-Studio.git
cd HeartMuLa-Studio
docker compose up -d

# View logs (watch model download progress on first run)
docker compose logs -f

Open http://localhost:8000

Alternative: Pull and Run Directly

# Create directories for persistent data
mkdir -p backend/models backend/generated_audio backend/ref_audio

# Run the pre-built image (Docker Hub)
docker run -d \
  --gpus all \
  -p 8000:8000 \
  -v ./backend/models:/app/backend/models \
  -v ./backend/generated_audio:/app/backend/generated_audio \
  -v ./backend/ref_audio:/app/backend/ref_audio \
  --name heartmula-studio \
  ambsd/heartmula-studio:latest

Available registries:

Docker Hub: ambsd/heartmula-studio:latest
GitHub: ghcr.io/fspecii/heartmula-studio:latest

What Happens on First Run

Docker builds the image (~10GB, includes CUDA + PyTorch)
Models are automatically downloaded from HuggingFace (~5GB)
Container starts with GPU auto-detection
Frontend + API served on port 8000

Persistent Data

All your data is preserved across container restarts:

Data	Location	Description
Generated Music	`./backend/generated_audio/`	Your MP3 files (accessible from host)
Models	`./backend/models/`	Downloaded AI models (~5GB)
Reference Audio	`./backend/ref_audio/`	Uploaded style references
Song History	Docker volume `heartmula-db`	Database with all your generations

Docker Commands

# Start
docker compose up -d

# Stop
docker compose down

# View logs
docker compose logs -f

# Rebuild after updates
docker compose build --no-cache
docker compose up -d

# Reset database (fresh start)
docker compose down -v
docker compose up -d

Docker Configuration

Override settings in docker-compose.yml:

environment:
  - HEARTMULA_SEQUENTIAL_OFFLOAD=true    # Force Memory Swap Mode (for limited VRAM)
  - HEARTMULA_COMPILE=true               # Enable torch.compile

volumes:
  # Use existing models from another location (e.g., ComfyUI)
  - /path/to/comfyui/models/heartmula:/app/backend/models

Tip: Use the Settings Modal in the UI to change settings instead of editing docker-compose.yml. Settings persist in backend/settings.json.

Using Ollama with Docker

To use Ollama (running on host) for AI lyrics generation:

Ollama is auto-configured - The container uses host.docker.internal to reach Ollama on your host machine
Just run Ollama normally on your host (not in Docker)
The container will automatically connect to http://host.docker.internal:11434

Custom Ollama URL:

environment:
  - OLLAMA_HOST=http://your-ollama-server:11434

Prerequisites

Python 3.10 or higher
Node.js 18 or higher
NVIDIA GPU with 10GB+ VRAM (12GB+ recommended)
Git for cloning the repository

Installation

1. Clone the Repository

git clone https://github.com/fspecii/HeartMuLa-Studio.git
cd HeartMuLa-Studio

2. Backend Setup

# Create virtual environment in root folder
python -m venv venv
source venv/bin/activate  # Windows: venv\Scripts\activate

# Install backend dependencies
pip install -r backend/requirements.txt

Note: HeartLib models (~5GB) will be downloaded automatically from HuggingFace on first run.

3. Frontend Setup

cd frontend

# Install dependencies
npm install

# Build for production
npm run build

Usage

Start the Backend

source venv/bin/activate  # Windows: venv\Scripts\activate

# Single GPU
python -m uvicorn backend.app.main:app --host 0.0.0.0 --port 8000

# Multi-GPU (recommended for 2+ GPUs)
CUDA_VISIBLE_DEVICES=0,1 python -m uvicorn backend.app.main:app --host 0.0.0.0 --port 8000

Start the Frontend

Development mode:

cd frontend
npm run dev

Production mode:

# Serve the dist folder with any static server
npx serve dist -l 5173

Access the Application

Mode	URL
Development	http://localhost:5173
Production	http://localhost:8000

Configuration

Environment Variables

Create a .env file in the backend directory:

# OpenRouter API (for cloud LLM)
OPENROUTER_API_KEY=your_api_key_here

# Ollama (for local LLM)
OLLAMA_HOST=http://localhost:11434

HeartMuLa Configuration (set when running):

Variable	Default	Description
`HEARTMULA_MODEL_DIR`	`backend/models`	Custom model directory (share with ComfyUI, etc.)
`HEARTMULA_SEQUENTIAL_OFFLOAD`	`auto`	Memory Swap Mode: `auto`, `true`, or `false`
`HEARTMULA_COMPILE`	`false`	torch.compile for ~2x faster inference: `true` or `false`
`HEARTMULA_COMPILE_MODE`	`default`	Compile mode: `default`, `reduce-overhead`, or `max-autotune`
`HEARTMULA_VERSION`	`RL-3B-20260123`	Model version (latest RL-tuned model)
`CUDA_VISIBLE_DEVICES`	all GPUs	Specify which GPUs to use (e.g., `0,1`)

Tip: Most settings can now be changed via the Settings Modal in the UI without restarting.

Example: Use existing models from ComfyUI:

HEARTMULA_MODEL_DIR=/path/to/comfyui/models/heartmula ./start.sh

GPU Auto-Configuration

HeartMuLa Studio automatically detects your available VRAM (not just total, accounting for other apps) and selects the optimal configuration:

Available VRAM	Auto-Selected Mode	Performance	Example GPUs
20GB+	Full Precision	Fastest	RTX 4090, RTX 3090 Ti, A6000
14-20GB	mmgp bf16	Fast	RTX 4060 Ti 16GB, RTX 3090
10-14GB	mmgp bf16 + Model Swap	~1.3-1.6x RTF	RTX 3060 12GB, RTX 4080 12GB
8-10GB	mmgp bf16 + Model Swap (Low VRAM)	May have issues	RTX 3070, RTX 4060 8GB
<8GB	Not supported	-	Insufficient VRAM

RTF = Real-Time Factor (e.g., 1.3x means 60s of audio takes ~80s to generate)

Multi-GPU: Automatically detected. HeartMuLa goes to fastest GPU (Flash Attention), HeartCodec to largest VRAM GPU.

Start Options

./start.sh                # Auto-detect (recommended)
./start.sh --force-swap   # Force Memory Swap Mode (low VRAM mode)
./start.sh --help         # Show all options

Manual Configuration (Advanced)

Override auto-detection with environment variables:

# Force Memory Swap Mode
HEARTMULA_SEQUENTIAL_OFFLOAD=true ./start.sh

# Or run directly
python -m uvicorn backend.app.main:app --host 0.0.0.0 --port 8000

Variable	Values	Description
`HEARTMULA_SEQUENTIAL_OFFLOAD`	`auto`, `true`, `false`	Memory Swap Mode (default: auto)
`CUDA_VISIBLE_DEVICES`	`0`, `0,1`, etc.	Select specific GPUs

Memory Optimization:

PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True

Recommended: Use the Settings Modal in the UI for most configuration changes. It provides a visual interface and saves settings persistently.

LLM Setup (Optional)

For AI-powered lyrics generation:

Option A: Ollama (Local)

# Install Ollama
curl -fsSL https://ollama.com/install.sh | sh

# Pull a model
ollama pull llama3.2

Option B: OpenRouter (Cloud)

Get an API key from OpenRouter
Add it to your .env file

Project Structure

HeartMuLa-Studio/
├── backend/
│   ├── app/
│   │   ├── main.py              # FastAPI application & routes
│   │   ├── models.py            # Pydantic/SQLModel schemas
│   │   └── services/
│   │       ├── music_service.py # HeartLib integration
│   │       └── llm_service.py   # LLM providers
│   ├── generated_audio/         # Output MP3 files
│   ├── ref_audio/               # Uploaded reference audio
│   ├── jobs.db                  # SQLite database
│   └── requirements.txt
├── frontend/
│   ├── src/
│   │   ├── components/
│   │   │   ├── ComposerSidebar.tsx    # Main generation form
│   │   │   ├── BottomPlayer.tsx       # Audio player
│   │   │   ├── RefAudioRegionModal.tsx # Waveform selector
│   │   │   ├── HistoryFeed.tsx        # Track history
│   │   │   └── ...
│   │   ├── App.tsx              # Main application
│   │   └── api.ts               # Backend API client
│   ├── public/
│   └── package.json
├── preview.gif
└── README.md

API Endpoints

Method	Endpoint	Description
`POST`	`/generate/music`	Start music generation
`POST`	`/generate/lyrics`	Generate lyrics with LLM
`POST`	`/upload/ref_audio`	Upload reference audio
`GET`	`/history`	Get generation history
`GET`	`/jobs/{id}`	Get job status
`GET`	`/events`	SSE stream for real-time updates
`GET`	`/audio/{path}`	Stream generated audio

Troubleshooting

Issue	Solution
CUDA out of memory	Enable Memory Swap Mode in Settings Modal, or try `./start.sh --force-swap`
CUDA device-side assert	Update to latest version (fixed in mmgp integration)
Models not downloading	Check internet connection and disk space (~5GB needed in `backend/models/`)
Frontend can't connect	Ensure backend is running on port 8000
LLM not working	Check Ollama is running or OpenRouter API key is set in `backend/.env`
Only one GPU detected	Set `CUDA_VISIBLE_DEVICES=0,1` explicitly when starting backend
Slow generation	Check Settings Modal for GPU config; bf16 is faster than INT8
Settings not saving	Check write permissions on `backend/settings.json`

Models Location

Models are auto-downloaded to backend/models/ (~5GB total):

backend/models/
├── HeartMuLa-oss-RL-3B-20260123/   # Main model
├── HeartCodec-oss/                  # Audio codec
├── tokenizer.json
└── gen_config.json

Credits

HeartMuLa/heartlib - The open-source AI music generation engine
mainza-ai/milimomusic - Inspiration for the backend architecture
WaveSurfer.js - Audio waveform visualization

License

This project is open source under the MIT License.

Contributing

Contributions are welcome! Please feel free to:

Fork the repository
Create a feature branch (git checkout -b feature/amazing-feature)
Commit your changes (git commit -m 'Add amazing feature')
Push to the branch (git push origin feature/amazing-feature)
Open a Pull Request

Made with ❤️ for the open-source AI music community

Name		Name	Last commit message	Last commit date
Latest commit History 36 Commits
.github/workflows		.github/workflows
backend		backend
frontend		frontend
.dockerignore		.dockerignore
.gitignore		.gitignore
Dockerfile		Dockerfile
README.md		README.md
docker-compose.yml		docker-compose.yml
preview.gif		preview.gif
start.sh		start.sh

fspecii/HeartMuLa-Studio

Folders and files

Latest commit

History

Repository files navigation

HeartMuLa Studio

Demo

Features

🎵 AI Music Generation

🎨 Reference Audio (Style Transfer) Experimental

🎤 AI-Powered Lyrics

🎧 Professional Interface

Tech Stack

Performance Optimizations

🧠 mmgp Memory Management

⚡ Flash Attention

🔥 torch.compile (Experimental)

🎯 Smart Multi-GPU Detection

📥 Auto-Download

Quick Start

Settings Modal

GPU Hardware Info

Configuration Options

Recommended Settings by GPU

Applying Changes

Docker (Recommended)

Prerequisites

Quick Start with Docker

Alternative: Pull and Run Directly

What Happens on First Run

Persistent Data

Docker Commands

Docker Configuration

Using Ollama with Docker

Prerequisites

Installation

1. Clone the Repository

2. Backend Setup

3. Frontend Setup

Usage

Start the Backend

Start the Frontend

Access the Application

Configuration

Environment Variables

GPU Auto-Configuration

Start Options

Manual Configuration (Advanced)

LLM Setup (Optional)

Project Structure

API Endpoints

Troubleshooting

Models Location

Credits

License

Contributing

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 4

Languages

🎨 Reference Audio (Style Transfer) `Experimental`

Packages