Vision Transformer (ViT) for CIFAR-10 Dataset

Links to related papers are https://arxiv.org/abs/2010.11929

Dataset

The data can be downloaded from the official website or in the image place, or the Download setting to True when the code loads the data

Overview

This project implements the Vision Transformer (ViT) model using the CIFAR-10 dataset. The Vision Transformer is a state-of-the-art architecture for image classification tasks, leveraging the power of self-attention mechanisms.

Features

Utilizes the latest advancements in deep learning for image classification. Integrates the powerful capabilities of transformers into computer vision tasks. Provides a robust and efficient solution for handling image data.

Key Components

Patch Embedding: Extracts image patches and converts them into token embeddings. Attention Mechanism: Captures global dependencies and relationships between tokens. MLP Layers: Employs multi-layer perceptrons for non-linear transformations. Transformer Blocks: Comprises attention layers followed by feed-forward neural networks. Vision Transformer (ViT) Model: Combines these components into a cohesive architecture.

Contributing

Contributions are welcome! Feel free to fork the repository and submit pull requests for improvements or bug fixes.

The final accuracy of the test set was 74% by training，Of course, there would have been better hyperparameter selection and a better way to define the model that would have made VIT perform better on the CIFAR-10 dataset, but I stopped at this level of accuracy because of the resources and time. If you have a higher level of accuracy, I hope you can give me a lot of advice, thank you.

Name		Name	Last commit message	Last commit date
Latest commit History 27 Commits
models		models
.gitignore		.gitignore
README.md		README.md
config.py		config.py
main.py		main.py
test.py		test.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Vision Transformer (ViT) for CIFAR-10 Dataset

Dataset

Overview

Features

Key Components

Contributing

About

Releases

Packages

Languages

first-coding/VIT

Folders and files

Latest commit

History

Repository files navigation

Vision Transformer (ViT) for CIFAR-10 Dataset

Dataset

Overview

Features

Key Components

Contributing

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages