A deacade into the attention craze, this repository casts the era of deep learning step by step from the paper that started it all for LLMs. Some of the content presented pre-dates Transformers while other work is fairly recent.
- Transformer Architecture
- Neural Translation
Clearly distinguish Read up on FuseNorm