Resources to Understand Transformer Architecture in NLP

Posting some great resources to understand the Transformer architecture for NLP presented in the paper “Attention is All You Need” (Vaswani et al. 2017).

This website by J Al-Ammar is excellent
The next best resource is this annotated implementation of Transformer in PyTorch from Harvard University
Second, read this article called “Attention! Attention!” by Lilian Weng
For further background on Word Embeddings, look into this post by Jason Brownlee.