Arpeggpt

Arpeggpt is a transformer model built from scratch that generates classical piano compositions. It has been pre-trained on the GiantMidi-Piano dataset, which is a piano dataset consisting of 10,855 MIDI files of 2,786 composers.

This repository is currently at a configurable state where users can bring in their own piano dataset in the MIDI format and can train the model based on multiple GPT configurations. This README explains how to set up the training process and invoke the CLI to train models of different predefined sizes: test, small, medium, large, and xl.

1. Quick Start

# Clone and enter the repo
git clone https://github.com/rhythmd18/Arpeggpt.git
cd Arpeggpt

# (Optional) create virtual environment
python -m venv .venv
source .venv/bin/activate  # Windows: .venv\Scripts\activate

# Install dependencies
pip install -r requirements.txt

2. Data Preparation

Create a data/ directory.

mkdir data
cd data

Place your training data in there. For example, in my case:

data/
  giantmidi/
  giantmidi-small/
  ...

3. Configuration Presets

Each model size corresponds to a preset configuration controlling:

Size	Layers (`n_layers`)	Embedding Dims (`emb_dim`)	Heads (`n_heads`)
test	12	192	12
small	12	768	12
medium	24	1024	16
large	36	1260	20
xl	48	1600	25

These configurations have been defined in arpeggpt/config.py

4. Training Commands

4.1 Train with the default configuration (Only for quick testing and debugging)

python main.py

Flags that define training configurations:

Flag	Meaning	Default
`--config <str>`	Define which GPT configuration to use (one of `test`, `small`, `medium`, `large`, and `xl`)	`test`
`--batch-size <int>`	The batch size	16
`--num-epochs <epochs>`	The number of epochs to train for	20
`--save-every <n>`	Save the model every n epochs	4

4.2 Small

python main.py --config small

4.3 Medium

python main.py --config medium

4.4 Large

python main.py --config large

4.5 XL

python main.py --config xl

Adjust batch size, number of epochs, and checkpointing frequency using the flags defined above. For example:

python main.py --config medium --batch-size 128 --num-epochs 100 --save-every 5

5. Attribution

LLMs-from-scratch

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
arpeggpt		arpeggpt
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
main.py		main.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Arpeggpt

1. Quick Start

2. Data Preparation

3. Configuration Presets

4. Training Commands

4.1 Train with the default configuration (Only for quick testing and debugging)

4.2 Small

4.3 Medium

4.4 Large

4.5 XL

5. Attribution

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Arpeggpt

1. Quick Start

2. Data Preparation

3. Configuration Presets

4. Training Commands

4.1 Train with the default configuration (Only for quick testing and debugging)

4.2 Small

4.3 Medium

4.4 Large

4.5 XL

5. Attribution

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages