Arpeggpt is a transformer model built from scratch that generates classical piano compositions. It has been pre-trained on the GiantMidi-Piano dataset, which is a piano dataset consisting of 10,855 MIDI files of 2,786 composers.
This repository is currently at a configurable state where users can bring in their own piano dataset in the MIDI format and can train the model based on multiple GPT configurations. This README explains how to set up the training process and invoke the CLI to train models of different predefined sizes:
test,small,medium,large, andxl.
# Clone and enter the repo
git clone https://github.com/rhythmd18/Arpeggpt.git
cd Arpeggpt
# (Optional) create virtual environment
python -m venv .venv
source .venv/bin/activate # Windows: .venv\Scripts\activate
# Install dependencies
pip install -r requirements.txt- Create a
data/directory.
mkdir data
cd data- Place your training data in there. For example, in my case:
data/
giantmidi/
giantmidi-small/
...
Each model size corresponds to a preset configuration controlling:
| Size | Layers (n_layers) |
Embedding Dims (emb_dim) |
Heads (n_heads) |
|---|---|---|---|
| test | 12 | 192 | 12 |
| small | 12 | 768 | 12 |
| medium | 24 | 1024 | 16 |
| large | 36 | 1260 | 20 |
| xl | 48 | 1600 | 25 |
These configurations have been defined in arpeggpt/config.py
python main.pyFlags that define training configurations:
| Flag | Meaning | Default |
|---|---|---|
--config <str> |
Define which GPT configuration to use (one of test, small, medium, large, and xl) |
test |
--batch-size <int> |
The batch size | 16 |
--num-epochs <epochs> |
The number of epochs to train for | 20 |
--save-every <n> |
Save the model every n epochs | 4 |
python main.py --config smallpython main.py --config mediumpython main.py --config largepython main.py --config xlAdjust batch size, number of epochs, and checkpointing frequency using the flags defined above. For example:
python main.py --config medium --batch-size 128 --num-epochs 100 --save-every 5