Grammar Scoring Model Fine-Tuning and Evaluation

This repository contains scripts to process audio datasets, transcribe them using the Whisper model, fine-tune a Gemma-3-4B model for grammar scoring, and evaluate its performance on transcribed text. The project leverages modern machine learning libraries and tools like Unsloth, HuggingFace Transformers, and Whisper for efficient fine-tuning and speech-to-text transcription.

Prerequisites

Python 3.8 or higher
A compatible GPU (recommended for faster processing; CUDA support required for GPU acceleration)
Git installed to clone the repository
Access to audio datasets for training and testing (not included in this repo)

Installation

To get started, clone this repository and install the required dependencies:

git clone https://github.com/Subhanshusethi/GrammarScoringEngine.git
cd GrammarScoringEngine
pip install -r requirements.txt

The requirements.txt file includes all necessary libraries, such as torch, transformers, unsloth, and others for model fine-tuning, audio transcription, and evaluation.

Project Structure

This repository contains the following key files:

requirements.txt
- Lists all Python dependencies required for the project.
- Install using pip install -r requirements.txt.
extract_text.py
- A script to transcribe audio files using OpenAI's Whisper model (whisper-large-v3-turbo) and prepare datasets for training and testing.
- Outputs:
  - Training data in JSON format (ShareGPT-style with system, user, and assistant roles).
  - Test data transcriptions in CSV format.
finetune_G_eval.py
- A script to fine-tune a Gemma-3-4B model (quantized to 4-bit) on grammar scoring tasks and evaluate its performance.
- Performs:
  - Model fine-tuning using LoRA (Low-Rank Adaptation) with the Unsloth library.
  - Grammar score prediction on a test set.
  - Evaluation metrics (MAE, RMSE, and accuracy) on a held-out test split.

Usage

1. Transcribing Audio Data (`extract_text.py`)

This script processes audio files to generate transcribed text for training and testing.

Command

python extract_text.py --train_csv_path <train_csv> --test_csv_path <test_csv> --train_audio_dir <train_dir> --test_audio_dir <test_dir>

Arguments

--train_csv_path: Path to a CSV file with columns filename (audio file names) and label (grammar scores).
--test_csv_path: Path to a CSV file with a filename column (no labels required).
--train_audio_dir: Directory containing training audio files.
--test_audio_dir: Directory containing test audio files.
--output_train_json (optional): Output path for training JSON (default: grammar_score_training_data_with_system.json).
--output_test_csv (optional): Output path for test CSV (default: transcribed_test_set.csv).

Example

python extract_text.py --train_csv_path data/train.csv --test_csv_path data/test.csv --train_audio_dir audio/train --test_audio_dir audio/test

Output

Training data saved as a JSON file with system prompts and grammar scores.
Test data saved as a CSV file with filenames and transcribed text.

2. Fine-Tuning and Evaluation (`finetune_G_eval.py`)

This script fine-tunes a Gemma-3-4B model on the transcribed training data and evaluates its grammar scoring performance.

Command

python finetune_G_eval.py --training_json_path <train_json> --input_csv_path <test_csv>

Arguments

--training_json_path: Path to the training JSON file generated by extract_text.py.
--input_csv_path: Path to the test CSV file with transcribed text (from extract_text.py).
--output_csv_path (optional): Output path for the scored test CSV (default: grammar_scored_test_set.csv).
--eval_model (optional): Boolean flag to enable/disable evaluation (default: True).

Example

python finetune_G_eval.py --training_json_path grammar_score_training_data_with_system.json --input_csv_path transcribed_test_set.csv

Output

Fine-tuned model (saved implicitly by the trainer; modify SFTConfig in the script to save explicitly if needed).
Test set with predicted grammar scores saved as a CSV file.
Evaluation metrics (MAE, RMSE, accuracy) printed for the held-out test split.

Model Comparison

Below is a table comparing the performance of different models on Kaggle competition.

Model	Parameters	Kaggle Scores
LLaMA 3.2 3B	3B	0.766
LLaMA 3.2 1B	1B	0.71
Gemma 4B	4B	0.802
Gemma 1B	1B	0.782

Notes:

The Gemma 4B model (fine-tuned in this repo) is optimized for grammar scoring with LoRA and 4-bit quantization.

Notes and Specification

Hardware Used: Fine-tuning and inference were performed on an NVIDIA L4 GPU with 12GB VRAM. This setup provides a good balance of memory and computational power for the Gemma-3-4B model quantized to 4-bit using LoRA. The L4 GPU supports efficient processing with the specified batch size (per_device_train_batch_size=4) and gradient accumulation steps (gradient_accumulation_steps=4).
Dataset: You must provide your own audio datasets and corresponding CSV files. The scripts assume specific column names (filename, label).
Model Downloads: The pre-trained unsloth/gemma-3-4b-it-unsloth-bnb-4bit model and openai/whisper-large-v3-turbo are downloaded from HuggingFace during execution, requiring an active internet connection.

Resources

![https://github.com/unslothai/unsloth.git]

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
README.md		README.md
extract_text.py		extract_text.py
finetune_G_eval.py		finetune_G_eval.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Grammar Scoring Model Fine-Tuning and Evaluation

Prerequisites

Installation

Project Structure

Usage

1. Transcribing Audio Data (`extract_text.py`)

Command

Arguments

Example

Output

2. Fine-Tuning and Evaluation (`finetune_G_eval.py`)

Command

Arguments

Example

Output

Model Comparison

Notes and Specification

Resources

About

Uh oh!

Releases

Packages

Uh oh!

Languages

Subhanshusethi/GrammarScoringEngine

Folders and files

Latest commit

History

Repository files navigation

Grammar Scoring Model Fine-Tuning and Evaluation

Prerequisites

Installation

Project Structure

Usage

1. Transcribing Audio Data (extract_text.py)

Command

Arguments

Example

Output

2. Fine-Tuning and Evaluation (finetune_G_eval.py)

Command

Arguments

Example

Output

Model Comparison

Notes and Specification

Resources

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

1. Transcribing Audio Data (`extract_text.py`)

2. Fine-Tuning and Evaluation (`finetune_G_eval.py`)

Packages