Hugging Face BERT Sentinment Analysis - AWS Trainium

Introduction

In this example, we will go through the steps required for easily adapt your PyTorch code for training a Machine Learning (ML) model by using Hugging Face and BERT as model type on an Amazon EC2 instance by using AWS Trainium chip.

In this repository, we are sharing some code examples for:

Train BERT ML model by using PyTorch and Hugging Face
1. Code: single Neuron Core
2. Notebook: notebook single Neuron Core
Distributed training of BERT ML model by using PyTorch and Hugging Face
1. Code: distributed training on Neuron Cores
2. Notebook: notebook distributed training on Neuron Cores

Infrastructure Setup for AWS Trainium

Prerequisites

Instance Image: Deep Learning AMI Neuron PyTorch 1.11
Instance Type: trn1.32xlarge
Git installed on the EC2 instance

git --version

Activate pre-built PyTorch environment

source /opt/aws_neuron_venv_pytorch/bin/activate

Check AWS Neuron SDK installation

neuron-ls

neuron-top

ML Training on single Neuron Core

Activate pre-built PyTorch environment

Test the code execution by using the provided notebook

CL execution example

cd examples/01-trainium-single-core

python3 train.py

Distributed Training on all available Neuron Cores

Activate pre-built PyTorch environment

Test the code execution by using the provided notebook

CL execution example

cd examples/02-trainium-distributed-training

export TOKENIZERS_PARALLELISM=false

torchrun --nproc_per_node=32 train.py

Errors

Flush Neuron Cores

sudo rmmod neuron; sudo modprobe neuron