The table below lists the performances you can expect from the provided config files. Note that you can achieve lower memory consumption by lowering the micro batch size as needed. In addition, you can lower the rank (lora_r
) in the LoRA configuration files and disable LoRA for certain layers (for example, setting lora_projection
and other LoRA layer-specific parameters to false
).
For more information, see the Dealing with out-of-memory (OOM) errors on lowering the memory requirements.
The "Cost" column refers to the on-demand compute cost on Lightning AI Studios where these benchmarks were executed.
All experiments were conducted using bfloat-16 precision on the Alpaca2k dataset. The "Multitask score" refers to MMLU.
Config | Model | Epochs | Max seq length | Micro batch size | Machine | Training runtime | Cost | Peak memory | Validation loss | Validation perplexity | Multitask score (MMLU) |
---|---|---|---|---|---|---|---|---|---|---|---|
falcon-7b/lora.yaml | falcon-7b | 4 | 512 | 1 | 1xA10G | 24.84 min | $0.7 | 16.69 GB | 0.945 | 2.573 | 26.2% |
falcon-7b/lora.yaml | falcon-7b | 4 | 512 | 1 | 4xA10G | 24.94 min | $2.0 | 16.69 GB | 0.945 | 2.573 | 26.4% |
falcon-7b/qlora.yaml | falcon-7b | 4 | 512 | 1 | 1xA10G | 50.85 min | $1.5 | 9.44 GB | 0.993 | 2.699 | 26.3% |
gemma-2b/full.yaml | gemma-2b | 1 | 512 | 1 | 4xA10G | 14.06 min | $1.1 | 17.43 GB | 1.021 | 2.777 | 32.4% |
gemma-2b/lora.yaml | gemma-2b | 2 | 512 | 2 | 1xA10G | 9.41 min | $0.3 | 12.62 GB | 0.981 | 2.666 | 34.4% |
gemma-2b/lora.yaml | gemma-2b | 2 | 512 | 2 | 4xA10G | 9.41 min | $0.8 | 12.62 GB | 0.981 | 2.667 | 34.0% |
gemma-2b/qlora.yaml | gemma-2b | 2 | 512 | 2 | 1xA10G | 12.91 min | $0.4 | 11.58 GB | 1.085 | 2.959 | 36.4% |
gemma-7b/lora.yaml | gemma-7b | 2 | 512 | 1 | 1xA10G | OOM | OOM | OOM | OOM | OOM | |
gemma-7b/lora.yaml | gemma-7b | 2 | 512 | 1 | 4xA10G | OOM | OOM | OOM | OOM | OOM | |
gemma-7b/qlora.yaml | gemma-7b | 2 | 512 | 1 | 1xA10G | 43.58 min | $1.3 | 17.18 GB | 0.973 | 2.646 | 62.45% |
gemma2-2b/lora.yaml | gemma-2b | 2 | 512 | 2 | 1xA10G | 11.96 min | $0.4 | 14.31 GB | 0.951 | 2.589 | 23.84% |
gemma2b/qlora.yaml | gemma-2b | 2 | 512 | 2 | 1xA10G | 16.06 min | $0.5 | 13.52 GB | 0.983 | 2.673 | 24.12% |
gemma2-9b/lora.yaml | gemma-2-9b | 2 | 512 | 1 | 1xA10G | OOM | OOM | OOM | OOM | OOM | |
gemma2-9b/lora.yaml | gemma-2-9b | 2 | 512 | 1 | 4xA10G | OOM | OOM | OOM | OOM | OOM | |
gemma2-9b/qlora.yaml | gemma-2-9b | 2 | 512 | 1 | 1xA10G | 50.01 min | $4.0 | 20.92 GB | 0.852 | 2.345 | 24.2% |
llama-2-7b/full.yaml | llama-2-7b | 1 | 512 | 4 | 4xA10G | OOM | OOM | OOM | OOM | OOM | |
llama-2-7b/lora.yaml | llama-2-7b | 4 | 512 | 2 | 1xA10G | 32.82 min | $1.0 | 19.77 GB | 0.802 | 2.230 | 40.3% |
llama-2-7b/lora.yaml | llama-2-7b | 4 | 512 | 2 | 4xA10G | 32.83 min | $2.6 | 19.77 GB | 0.802 | 2.229 | 40.2% |
llama-2-7b/qlora.yaml | llama-2-7b | 4 | 512 | 2 | 1xA10G | 45.67 min | $1.4 | 13.68 GB | 0.814 | 2.258 | 38.6% |
llama-3-8b/full.yaml | llama-3-8b | 1 | 512 | 4 | 4xA10G | OOM | OOM | OOM | OOM | OOM | |
llama-3-8b/lora.yaml | llama-3-8b | 2 | 512 | 1 | 1xA10G | 14.79 min | $0.4 | 19.73 GB | 0.888 | 2.431 | 62.4% |
llama-3-8b/lora.yaml | llama-3-8b | 2 | 512 | 1 | 4xA10G | 14.88 min | $1.2 | 19.73 GB | 0.889 | 2.432 | 62.5% |
llama-3-8b/qlora.yaml | llama-3-8b | 2 | 512 | 2 | 1xA10G | 22.24 min | $0.7 | 17.41 GB | 0.939 | 2.558 | 62.2% |
llama-3.1-8b/full.yaml | llama-3.1-8b | 1 | 512 | 4 | 1xA10G | OOM | OOM | OOM | OOM | OOM | OOM |
llama-3.1-8b/lora.yaml | llama-3.1-8b | 2 | 512 | 1 | 1xA10G | 13.36 min | $1.1 | 19.73 GB | 0.878 | 2.406 | xx.xx |
llama-3.1-8b/qlora.yaml | llama-3.1-8b | 2 | 512 | 2 | 1xA10G | 21.81 min | $0.7 | 17.41 GB | 0.928 | 2.529 | xx.xx |
llama-3.2-1b/full.yaml | llama-3.2-1b | 1 | 512 | 4 | 1xA10G | 2.01 min | $0.1 | 8.70 GB | 1.442 | 4.229 | 38.21% |
llama-3.2-1b/lora.yaml | llama-3.2-1b | 2 | 512 | 1 | 1xA10G | 4.17 min | $0.4 | 4.49 GB | 1.114 | 3.046 | 36.87% |
llama-3.2-1b/qlora.yaml | llama-3.2-1b | 2 | 512 | 2 | 1xA10G | 6.20 min | $0.6 | 5.53 GB | 1.201 | 3.322 | 36.49% |
llama-3.2-3b/full.yaml | llama-3.2-3b | 1 | 512 | 4 | 1xA10G | 4.71 min | $0.4 | 16.51 GB | 1.255 | 3.509 | 54.69% |
llama-3.2-3b/lora.yaml | llama-3.2-3b | 2 | 512 | 1 | 1xA10G | 8.31 min | $0.8 | 9.67 GB | 0.973 | 2.647 | 54.77% |
llama-3.2-3b/qlora.yaml | llama-3.2-3b | 2 | 512 | 2 | 1xA10G | 14.89 min | $1.4 | 10.30 GB | 1.031 | 2.804 | 55.08% |
mistral-7b-v0.2/lora.yaml | mistral-7b-v0.2 | 4 | 512 | 2 | 1xA10G | 31.00 min | $0.9 | 20.66 GB | 0.801 | 2.228 | 55.7% |
mistral-7b-v0.2/lora.yaml | mistral-7b-v0.2 | 4 | 512 | 2 | 4xA10G | 31.00 min | $2.5 | 20.66 GB | 0.802 | 2.229 | 55.5% |
mistral-7b-v0.2/qlora.yaml | mistral-7b-v0.2 | 4 | 512 | 2 | 1xA10G | 44.75 min | $1.3 | 14.29 GB | 0.813 | 2.255 | 56.5% |
mistral-7b/lora.yaml | mistral-7b | 4 | 512 | 2 | 1xA10G | 31.01 min | $0.9 | 20.66 GB | 0.794 | 2.211 | 57.9% |
mistral-7b/lora.yaml | mistral-7b | 4 | 512 | 2 | 4xA10G | 31.03 min | $2.5 | 20.66 GB | 0.796 | 2.218 | 57.9% |
mistral-7b/qlora.yaml | mistral-7b | 4 | 512 | 2 | 1xA10G | 44.75 min | $1.3 | 14.29 GB | 0.803 | 2.231 | 57.9% |
phi-2/full.yaml | phi-2 | 1 | 512 | 4 | 4xA10G | 11.87 min | $1.0 | 14.44 GB | 1.305 | 3.688 | 38.4% |
phi-2/lora.yaml | phi-2 | 1 | 512 | 4 | 1xA10G | 3.78 min | $0.1 | 13.98 GB | 0.819 | 2.269 | 53.0% |
phi-2/lora.yaml | phi-2 | 1 | 512 | 4 | 4xA10G | 3.78 min | $0.3 | 13.98 GB | 0.820 | 2.271 | 52.4% |
phi-2/qlora.yaml | phi-2 | 1 | 512 | 4 | 1xA10G | 4.51 min | $0.1 | 14.27 GB | 0.837 | 2.310 | 52.3% |
phi-3/full.yaml | Phi-3-mini-4k-instruct | 1 | 512 | 4 | 1xA10G | 6.93 min | $0.2 | 17.01 GB | 0.714 | 2.043 | 69.81% |
phi-3/lora.yaml | Phi-3-mini-4k-instruct | 1 | 512 | 4 | 1xA10G | 6.46 min | $0.2 | 19.75 GB | 0.707 | 2.028 | 69.70% |
phi-3/qlora.yaml | Phi-3-mini-4k-instruct | 1 | 512 | 4 | 1xA10G | 7.47 min | $0.2 | 19.13 GB | 0.729 | 2.074 | 68.96% |
stablelm-base-alpha-3b/full.yaml | stablelm-base-alpha-3b | 1 | 512 | 1 | 4xA10G | 70.13 min | $5.6 | 21.23 GB | 1.513 | 4.540 | 23.2% |
stablelm-base-alpha-3b/lora.yaml | stablelm-base-alpha-3b | 4 | 512 | 1 | 1xA10G | 13.07 min | $0.4 | 8.58 GB | 1.361 | 3.900 | 25.9% |
stablelm-base-alpha-3b/lora.yaml | stablelm-base-alpha-3b | 4 | 512 | 1 | 4xA10G | 13.16 min | $1.1 | 8.58 GB | 1.362 | 3.906 | 25.9% |
stablelm-base-alpha-3b/qlora.yaml | stablelm-base-alpha-3b | 4 | 512 | 1 | 1xA10G | 25.86 min | $0.8 | 5.24 GB | 1.388 | 4.009 | 26.1% |
tiny-llama/full.yaml | tiny-llama | 1 | 512 | 4 | 1xA10G | 2.58 min | $0.1 | 14.10 GB | 1.088 | 2.968 | 24.6% |
tiny-llama/full.yaml | tiny-llama | 1 | 512 | 4 | 4xA10G | 2.57 min | $0.2 | 14.10 GB | 1.088 | 2.968 | 24.5% |
tiny-llama/lora.yaml | tiny-llama | 3 | 512 | 8 | 1xA10G | 8.09 min | $0.2 | 13.50 GB | 1.039 | 2.826 | 25.5% |
tiny-llama/qlora.yaml | tiny-llama | 3 | 512 | 8 | 1xA10G | 8.70 min | $0.3 | 16.24 GB | 1.056 | 2.874 | 25.3% |
*OOM = Out of memory
If you require a longer sequence length than the one used in a given config file, you can either edit the max_seq_length
in the config file or pass an additional argument when running the finetuning command, for example, --max_seq_length 4096
to override the sequence length provided in the config file.
If you are training on GPUs without bfloat-16 support, you need to change the precision
option to 16-true
(16-bit floating point precision) or 16-mixed
(16/32-bit mixed precision) training:
litgpt finetune lora \
--config config_hub/finetune/phi-2/lora.yaml \
--precision 16-true
or
litgpt finetune lora \
--config config_hub/finetune/phi-2/lora.yaml \
--precision 16-mixed
Note that 16-true
is more compute and memory-efficient, but it can sometimes lead to training convergence issues. In this case, it's recommended to use 16-mixed
.
All runs are single-GPU experiments, use --devices 4
to utilize more than one GPU:
litgpt finetune lora \
--config config_hub/finetune/phi-2/lora.yaml \
--devices 4