Betha

Betha is a tribute to Alpa

While Alpha is the nuclear weapon for large model training and serving, Betha is a toy example of model parallel training of a simple GPT-J implementation using manual sharding and Ray core.

model.py has the broken up GPT-J model. shard.py is the Ray actor wrapper that helps flow tensors and gradients across nodes.

To run: python train --model_dir=<cached HF GPT-J model>

Name		Name	Last commit message	Last commit date
Latest commit History 26 Commits
.gitignore		.gitignore
README.md		README.md
alllines.txt		alllines.txt
fx.py		fx.py
model.py		model.py
shard.py		shard.py
test.py		test.py
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Betha

About

Releases

Packages

Languages

gjoliver/Betha

Folders and files

Latest commit

History

Repository files navigation

Betha

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages