Skip to content

t-majumder/Model-Finetuning

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

4 Commits
Β 
Β 
Β 
Β 

Repository files navigation

πŸš€ Gemma 3-1B-IT Model Fine-tuning

License: MIT Python 3.8+ Unsloth

Fine-tuned Gemma 3-1B-IT model using Quantized Low-Rank Adaptation (Q-LoRA) with Unsloth optimization for efficient training and inference.

πŸ“‹ Overview

This project implements parameter-efficient fine-tuning of Google's Gemma 3-1B-IT model using:

  • Q-LoRA: Quantized Low-Rank Adaptation for memory-efficient training
  • Unsloth: Optimized quantized model (4-bit) for faster training
  • Gradient Checkpointing: Reduced memory footprint during backpropagation

🎯 Key Features

  • βœ… Memory Efficient: 4-bit quantization reduces VRAM requirements by ~75%
  • βœ… Fast Training: Unsloth optimization provides 2-5x speedup
  • βœ… LoRA Adaptation: Fine-tune only 0.1-1% of model parameters
  • βœ… Production Ready: Includes generation parameters for inference

βš™οΈ Model Configuration

Model & Training Parameters

# Model Selection
MODEL_NAME = "unsloth/gemma-3-1b-it-unsloth-bnb-4bit"
MAX_SEQ_LENGTH = 2048

# Training Configuration
BATCH_SIZE = 2
GRAD_ACCUMULATION = 4  # Effective batch size = 8
NUM_EPOCHS = 5
LEARNING_RATE = 2e-4
WARMUP_STEPS = 10
LR_SCHEDULER_TYPE = "linear"
WEIGHT_DECAY = 0.01
SEED = 3407

Explanation:

  • MAX_SEQ_LENGTH: Maximum token sequence length (2048 tokens ~1500 words)
  • GRAD_ACCUMULATION: Accumulates gradients over 4 steps to simulate larger batch size
  • LEARNING_RATE: Standard learning rate for LoRA fine-tuning (2e-4 is optimal for most tasks)
  • WARMUP_STEPS: Gradual learning rate increase to stabilize early training

LoRA Hyperparameters

# LoRA Configuration
LORA_R = 64
LORA_ALPHA = 128
LORA_DROPOUT = 0.05
TARGET_MODULES = [
    "q_proj", "k_proj", "v_proj", "o_proj",
    "gate_proj", "up_proj", "down_proj"
]
USE_GRAD_CHECKPOINT = "unsloth"

Explanation:

  • LORA_R: Rank of LoRA matrices (higher = more capacity, 64 is balanced)
  • LORA_ALPHA: Scaling factor (typically 2x the rank for optimal performance)
  • LORA_DROPOUT: Dropout rate to prevent overfitting in adapter layers
  • TARGET_MODULES: Attention (q/k/v/o) and MLP (gate/up/down) projection layers
  • USE_GRAD_CHECKPOINT: Unsloth's optimized gradient checkpointing

Training Management

# Logging & Model Saving
LOGGING_STEPS = 5
OUTPUT_DIR = "/content/drive/MyDrive/outputs"
SAVE_STRATEGY = "epoch"
SAVE_TOTAL_LIMIT = 2
REPORT_TO = "none"

Explanation:

  • LOGGING_STEPS: Log metrics every 5 training steps
  • SAVE_STRATEGY: Save checkpoint after each epoch
  • SAVE_TOTAL_LIMIT: Keep only the 2 most recent checkpoints (saves storage)
  • REPORT_TO: Disable external logging (e.g., Weights & Biases)

Generation Settings

# Inference Configuration
GEN_MAX_TOKENS = 256
GEN_TEMPERATURE = 0.7
GEN_TOP_P = 0.9

Explanation:

  • GEN_MAX_TOKENS: Maximum length of generated responses
  • GEN_TEMPERATURE: Controls randomness (0.7 = balanced creativity)
  • GEN_TOP_P: Nucleus sampling threshold (0.9 = diverse but coherent)

πŸ”§ Installation

# Install Unsloth
pip install unsloth

# Install dependencies
pip install torch transformers datasets trl peft accelerate

πŸ“Š Training Details

Metric Value
Effective Batch Size 8 (2 Γ— 4)
Total Training Steps ~(dataset_size / 8) Γ— 5 epochs
Trainable Parameters ~0.5% of total model
Memory Usage ~2.5-3 GB VRAM
Training Time ~30 mins on T4 GPU

Acknowledgments


Note: Adjust hyperparameters based on your dataset size and task complexity. For smaller datasets, reduce LORA_R and NUM_EPOCHS to prevent overfitting.

About

Model finetuning for custom tasks.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published