T5 Grammarator

Fine-tuning Google’s T5 for Grammatical Error Correction (GEC) in sentences.

This project provides a FastAPI-based service for Grammar Error Correction (GEC) using fine-tuned T5 models. We applied three different fine-tuning approaches (LoRA, QLoRA, and Full fine-tuning) and evaluate each method’s effectiveness in terms of performance, efficiency, and time and resource requirements.

Tech Stack

Programming Language: Python
Deep Learning Framework: PyTorch
Data Processing and Handling: Numpy, Pandas
Model: Hugging Face Transformers
Optimization: Optimum, ONNX Runtime
Backend: FastAPI, Uvicorn
Frontend: HTML, CSS, JS

Dataset

We used a Custom HF Dataset of 200k sentence pairs which is a compilation of the following 2 datasets:

https://www.kaggle.com/datasets/satishgunjal/grammar-correction

https://bekushal.medium.com/cleaned-lang8-dataset-for-grammar-error-detection-79aaa31150aa

File Structure

├─ app/                             # files for deployment and web app
├─ digit_recognition_nn/            # NN implementation code
├─ fine_tuning/                     # fine-tuning and evaluation code
└─ shakespeare_textgen/             # LSTM implementation code
├─ media/                           # images used in readme, report, blog
├─ biweekly_blog.md
├─ report.md
├─ README.md

The evaluation in detail has been documented in the Comparison report

Reports & Documentation

View the Comparison Report analysing the performance of Full Fine-Tuning, LoRA, and QLoRA across parameters like efficiency, memory usage, and accuracy.

We also maintained a detailed Project Blog to document workflow, progress, results and decisions made over the course of the project.

Evaluation Metrics

We evaluated the 3 models using the following metrics:

GLEU Score
Precision
F1
Recall

We also used WandB experiment tracking to track losses, CPU/GPU usage, etc. The detailed graph and evaluation inferences can be found in the Comparison Report.

Getting Started

Prerequisites

To install necessary libraries, run:

pip install transformers optimum[onnxruntime] torch fastapi uvicorn

Installation

Clone the repository

git clone https://github.com/sarayusapa/t5_Grammarator.git

Choose Model

Navigate to the following directory:

cd t5_Grammarator/app/

You can run either the Full Fine-Tuned model or the Adapter-based models (LoRA / QLoRA), depending on your preference.

Option A: Full Fine-Tuned Model (default)

Run convertmodel.py as it is.

Option B: Adapters (LoRA / QLoRA)

Step 1: In convertmodel.py, set:

USE_ADAPTER = True

Step 2: In the same file, set:

ADAPTER_PATH = "sarayusapa/T5_Large_GEC_LoRA"  #LoRA

or

 ADAPTER_PATH = sarayusapa/T5_Large_GEC_LoRA #QLoRA

Step 3: Run convertmodel.py to save the model in ONNX format.

Usage

Start API Server

In the same directory ("t5_Grammarator/app"), run the command:

uvicorn t5app:app --reload

After this, the server will be available to open and use.

Notes

Use Full FT model if you want the most accurate results.
Use Adapters (LoRA/QLoRA) if you want lightweight and memory-efficient inference.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

T5 Grammarator

Table of Contents

Tech Stack

Dataset

File Structure

Reports & Documentation

Evaluation Metrics

Getting Started

Prerequisites

Installation

Choose Model

Option A: Full Fine-Tuned Model (default)

Option B: Adapters (LoRA / QLoRA)

Usage

Start API Server

Notes

Future Work

Contributors

Acknowledgements

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 250 Commits
app		app
digit_recognition_nn		digit_recognition_nn
finetuning		finetuning
media		media
shakespeare_lstm		shakespeare_lstm
README.md		README.md
biweekly_blog.md		biweekly_blog.md
report.md		report.md

sarayusapa/t5_grammarator

Folders and files

Latest commit

History

Repository files navigation

T5 Grammarator

Table of Contents

Tech Stack

Dataset

File Structure

Reports & Documentation

Evaluation Metrics

Getting Started

Prerequisites

Installation

Choose Model

Option A: Full Fine-Tuned Model (default)

Option B: Adapters (LoRA / QLoRA)

Usage

Start API Server

Notes

Future Work

Contributors

Acknowledgements

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages