Feedback on Training Parameters for Fine-Tuning a 7B RAG LLM, slow training:/ #191

linahadhri · 2025-02-14T17:38:03Z

I’m fine-tuning a 7B RAG LLM and running into some issues with training speed and CUDA memory constraints. Here are my training parameters:

training_args = TrainingArguments(
output_dir="models/fine_tuned_2001",
overwrite_output_dir=True,
num_train_epochs=3,
warmup_steps=20,
logging_strategy="steps",
logging_steps=10,
evaluation_strategy="no",
optim="adamw_torch",
gradient_accumulation_steps=4,
save_steps=100,
save_total_limit=2,
learning_rate=1e-5,
per_device_train_batch_size=1,
max_steps=1000,
report_to="wandb"
)
Setup & Issues:

Hardware: 22GB GPU
Input Length: MAX_LENGTH=10154 (because the model takes query, answer, and chunks as input).
Dataset: ~2K pairs.
Problem:

Training is extremely slow—around 1 minute per step, meaning 1000 steps take ~16 hours.
I expected slowness due to large input lengths, but this seems excessive.
Am I overlooking something? Any tips on improving training speed without exceeding memory limits?
Batch Size: Had to reduce per_device_train_batch_size=1 due to CUDA OOM errors. Also reduced LoRA Settings to r=64, lora_alpha=16.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feedback on Training Parameters for Fine-Tuning a 7B RAG LLM, slow training:/ #191

Feedback on Training Parameters for Fine-Tuning a 7B RAG LLM, slow training:/ #191

linahadhri commented Feb 14, 2025

Feedback on Training Parameters for Fine-Tuning a 7B RAG LLM, slow training:/ #191

Feedback on Training Parameters for Fine-Tuning a 7B RAG LLM, slow training:/ #191

Comments

linahadhri commented Feb 14, 2025