Vision-Transformer-From-Scratch

Vision Transformer Implementation From Scratch

A clean, stable implementation of a Vision Transformer (ViT) in PyTorch, targeting the CIFAR-10 dataset.

Features

No nn.TransformerEncoder: Multi-Head Attention and Transformer Blocks are implemented from scratch.
Stable Training: Uses Pre-LayerNorm, truncated normal weight initialization, and gradient clipping.
Lightweight: Configured to easily train on a single 8GB VRAM GPU (e.g., RTX 5060Laptop).
Expected Performance: Reaches ~65% - 70% validation accuracy on CIFAR-10 in just 20 epochs.

Current version

v2.0 -feat: add predict.py to support interactive input

Setup

Ensure you have PyTorch and Torchvision installed:

pip install torch torchvision

Usage

1. Train the Model

To download the dataset and start training:

python train.py

This will train for 20 epochs and save the weights to vit_cifar10.pth.

2. Option 1: Run Inference in terminal

To test the model on a random image from the test set: python inference.py

Option 2 (Recommend): Predict on custom images (Interactive Mode)

To test the model on your own local image files, use the interactive prediction script. This allows you to process multiple images without restarting the program. python predict.py

Example Session:

Type 'exit' or 'quit' to stop.

Input the path (eg. my_cat.jpg): images/test_sample.jpg
------------------------------
Image: images/test_sample.jpg
Predicted Content: DOG
Confidence: 94.21%
------------------------------

Input the path (eg. my_cat.jpg): exit
Terminated

Historical version

For major version update history, see "version-log.md"

v2.0 -feat: add predict.py to support interactive input
v1.9 -docs: standardize document
v1.0 Architecture

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
.idea		.idea
model		model
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
inference.py		inference.py
predict.py		predict.py
train.py		train.py
utils.py		utils.py
version-log.md		version-log.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Vision-Transformer-From-Scratch

Features

Current version

Setup

Usage

1. Train the Model

2. Option 1: Run Inference in terminal

Option 2 (Recommend): Predict on custom images (Interactive Mode)

Historical version

For major version update history, see "version-log.md"

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Vision-Transformer-From-Scratch

Features

Current version

Setup

Usage

1. Train the Model

2. Option 1: Run Inference in terminal

Option 2 (Recommend): Predict on custom images (Interactive Mode)

Historical version

For major version update history, see "version-log.md"

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages