Skip to content

Subhanshusethi/GrammarScoringEngine

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 
 
 

Repository files navigation


Grammar Scoring Model Fine-Tuning and Evaluation

This repository contains scripts to process audio datasets, transcribe them using the Whisper model, fine-tune a Gemma-3-4B model for grammar scoring, and evaluate its performance on transcribed text. The project leverages modern machine learning libraries and tools like Unsloth, HuggingFace Transformers, and Whisper for efficient fine-tuning and speech-to-text transcription.

Prerequisites

  • Python 3.8 or higher
  • A compatible GPU (recommended for faster processing; CUDA support required for GPU acceleration)
  • Git installed to clone the repository
  • Access to audio datasets for training and testing (not included in this repo)

Installation

To get started, clone this repository and install the required dependencies:

git clone https://github.com/Subhanshusethi/GrammarScoringEngine.git
cd GrammarScoringEngine
pip install -r requirements.txt

The requirements.txt file includes all necessary libraries, such as torch, transformers, unsloth, and others for model fine-tuning, audio transcription, and evaluation.

Project Structure

This repository contains the following key files:

  1. requirements.txt

    • Lists all Python dependencies required for the project.
    • Install using pip install -r requirements.txt.
  2. extract_text.py

    • A script to transcribe audio files using OpenAI's Whisper model (whisper-large-v3-turbo) and prepare datasets for training and testing.
    • Outputs:
      • Training data in JSON format (ShareGPT-style with system, user, and assistant roles).
      • Test data transcriptions in CSV format.
  3. finetune_G_eval.py

    • A script to fine-tune a Gemma-3-4B model (quantized to 4-bit) on grammar scoring tasks and evaluate its performance.
    • Performs:
      • Model fine-tuning using LoRA (Low-Rank Adaptation) with the Unsloth library.
      • Grammar score prediction on a test set.
      • Evaluation metrics (MAE, RMSE, and accuracy) on a held-out test split.

Usage

1. Transcribing Audio Data (extract_text.py)

This script processes audio files to generate transcribed text for training and testing.

Command

python extract_text.py --train_csv_path <train_csv> --test_csv_path <test_csv> --train_audio_dir <train_dir> --test_audio_dir <test_dir>

Arguments

  • --train_csv_path: Path to a CSV file with columns filename (audio file names) and label (grammar scores).
  • --test_csv_path: Path to a CSV file with a filename column (no labels required).
  • --train_audio_dir: Directory containing training audio files.
  • --test_audio_dir: Directory containing test audio files.
  • --output_train_json (optional): Output path for training JSON (default: grammar_score_training_data_with_system.json).
  • --output_test_csv (optional): Output path for test CSV (default: transcribed_test_set.csv).

Example

python extract_text.py --train_csv_path data/train.csv --test_csv_path data/test.csv --train_audio_dir audio/train --test_audio_dir audio/test

Output

  • Training data saved as a JSON file with system prompts and grammar scores.
  • Test data saved as a CSV file with filenames and transcribed text.

2. Fine-Tuning and Evaluation (finetune_G_eval.py)

This script fine-tunes a Gemma-3-4B model on the transcribed training data and evaluates its grammar scoring performance.

Command

python finetune_G_eval.py --training_json_path <train_json> --input_csv_path <test_csv>

Arguments

  • --training_json_path: Path to the training JSON file generated by extract_text.py.
  • --input_csv_path: Path to the test CSV file with transcribed text (from extract_text.py).
  • --output_csv_path (optional): Output path for the scored test CSV (default: grammar_scored_test_set.csv).
  • --eval_model (optional): Boolean flag to enable/disable evaluation (default: True).

Example

python finetune_G_eval.py --training_json_path grammar_score_training_data_with_system.json --input_csv_path transcribed_test_set.csv

Output

  • Fine-tuned model (saved implicitly by the trainer; modify SFTConfig in the script to save explicitly if needed).
  • Test set with predicted grammar scores saved as a CSV file.
  • Evaluation metrics (MAE, RMSE, accuracy) printed for the held-out test split.

Model Comparison

Below is a table comparing the performance of different models on Kaggle competition.

Model Parameters Kaggle Scores
LLaMA 3.2 3B 3B 0.766
LLaMA 3.2 1B 1B 0.71
Gemma 4B 4B 0.802
Gemma 1B 1B 0.782

Notes: image

  • The Gemma 4B model (fine-tuned in this repo) is optimized for grammar scoring with LoRA and 4-bit quantization.

Notes and Specification

  • Hardware Used: Fine-tuning and inference were performed on an NVIDIA L4 GPU with 12GB VRAM. This setup provides a good balance of memory and computational power for the Gemma-3-4B model quantized to 4-bit using LoRA. The L4 GPU supports efficient processing with the specified batch size (per_device_train_batch_size=4) and gradient accumulation steps (gradient_accumulation_steps=4).
  • Dataset: You must provide your own audio datasets and corresponding CSV files. The scripts assume specific column names (filename, label).
  • Model Downloads: The pre-trained unsloth/gemma-3-4b-it-unsloth-bnb-4bit model and openai/whisper-large-v3-turbo are downloaded from HuggingFace during execution, requiring an active internet connection.

Resources

![https://github.com/unslothai/unsloth.git]

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages