Fine-tuned Gemma 3-1B-IT model using Quantized Low-Rank Adaptation (Q-LoRA) with Unsloth optimization for efficient training and inference.
This project implements parameter-efficient fine-tuning of Google's Gemma 3-1B-IT model using:
- Q-LoRA: Quantized Low-Rank Adaptation for memory-efficient training
- Unsloth: Optimized quantized model (4-bit) for faster training
- Gradient Checkpointing: Reduced memory footprint during backpropagation
- β Memory Efficient: 4-bit quantization reduces VRAM requirements by ~75%
- β Fast Training: Unsloth optimization provides 2-5x speedup
- β LoRA Adaptation: Fine-tune only 0.1-1% of model parameters
- β Production Ready: Includes generation parameters for inference
# Model Selection
MODEL_NAME = "unsloth/gemma-3-1b-it-unsloth-bnb-4bit"
MAX_SEQ_LENGTH = 2048
# Training Configuration
BATCH_SIZE = 2
GRAD_ACCUMULATION = 4 # Effective batch size = 8
NUM_EPOCHS = 5
LEARNING_RATE = 2e-4
WARMUP_STEPS = 10
LR_SCHEDULER_TYPE = "linear"
WEIGHT_DECAY = 0.01
SEED = 3407Explanation:
MAX_SEQ_LENGTH: Maximum token sequence length (2048 tokens ~1500 words)GRAD_ACCUMULATION: Accumulates gradients over 4 steps to simulate larger batch sizeLEARNING_RATE: Standard learning rate for LoRA fine-tuning (2e-4 is optimal for most tasks)WARMUP_STEPS: Gradual learning rate increase to stabilize early training
# LoRA Configuration
LORA_R = 64
LORA_ALPHA = 128
LORA_DROPOUT = 0.05
TARGET_MODULES = [
"q_proj", "k_proj", "v_proj", "o_proj",
"gate_proj", "up_proj", "down_proj"
]
USE_GRAD_CHECKPOINT = "unsloth"Explanation:
LORA_R: Rank of LoRA matrices (higher = more capacity, 64 is balanced)LORA_ALPHA: Scaling factor (typically 2x the rank for optimal performance)LORA_DROPOUT: Dropout rate to prevent overfitting in adapter layersTARGET_MODULES: Attention (q/k/v/o) and MLP (gate/up/down) projection layersUSE_GRAD_CHECKPOINT: Unsloth's optimized gradient checkpointing
# Logging & Model Saving
LOGGING_STEPS = 5
OUTPUT_DIR = "/content/drive/MyDrive/outputs"
SAVE_STRATEGY = "epoch"
SAVE_TOTAL_LIMIT = 2
REPORT_TO = "none"Explanation:
LOGGING_STEPS: Log metrics every 5 training stepsSAVE_STRATEGY: Save checkpoint after each epochSAVE_TOTAL_LIMIT: Keep only the 2 most recent checkpoints (saves storage)REPORT_TO: Disable external logging (e.g., Weights & Biases)
# Inference Configuration
GEN_MAX_TOKENS = 256
GEN_TEMPERATURE = 0.7
GEN_TOP_P = 0.9Explanation:
GEN_MAX_TOKENS: Maximum length of generated responsesGEN_TEMPERATURE: Controls randomness (0.7 = balanced creativity)GEN_TOP_P: Nucleus sampling threshold (0.9 = diverse but coherent)
# Install Unsloth
pip install unsloth
# Install dependencies
pip install torch transformers datasets trl peft accelerate| Metric | Value |
|---|---|
| Effective Batch Size | 8 (2 Γ 4) |
| Total Training Steps | ~(dataset_size / 8) Γ 5 epochs |
| Trainable Parameters | ~0.5% of total model |
| Memory Usage | ~2.5-3 GB VRAM |
| Training Time | ~30 mins on T4 GPU |
- Google Gemma for the base model
- Unsloth AI for optimization framework
- Hugging Face for transformers library
Note: Adjust hyperparameters based on your dataset size and task complexity. For smaller datasets, reduce LORA_R and NUM_EPOCHS to prevent overfitting.