Skip to content

Latest commit

 

History

History

hf_sentiment_analysis

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 
 
 
 
 
 
 

Hugging Face BERT Sentinment Analysis - AWS Trainium

Introduction

In this example, we will go through the steps required for easily adapt your PyTorch code for training a Machine Learning (ML) model by using Hugging Face and BERT as model type on an Amazon EC2 instance by using AWS Trainium chip.

In this repository, we are sharing some code examples for:

  1. Train BERT ML model by using PyTorch and Hugging Face
    1. Code: single Neuron Core
    2. Notebook: notebook single Neuron Core
  2. Distributed training of BERT ML model by using PyTorch and Hugging Face
    1. Code: distributed training on Neuron Cores
    2. Notebook: notebook distributed training on Neuron Cores

Infrastructure Setup for AWS Trainium

Prerequisites

git --version

Activate pre-built PyTorch environment

source /opt/aws_neuron_venv_pytorch/bin/activate

Check AWS Neuron SDK installation

neuron-ls

neuron-top

ML Training on single Neuron Core

Activate pre-built PyTorch environment

Test the code execution by using the provided notebook

CL execution example

cd examples/01-trainium-single-core

python3 train.py

Distributed Training on all available Neuron Cores

Activate pre-built PyTorch environment

Test the code execution by using the provided notebook

CL execution example

cd examples/02-trainium-distributed-training

export TOKENIZERS_PARALLELISM=false

torchrun --nproc_per_node=32 train.py

Errors

  1. Flush Neuron Cores
sudo rmmod neuron; sudo modprobe neuron