Posting some great resources to understand the Transformer architecture for NLP presented in the paper “Attention is All You Need” (Vaswani et al. 2017).

  1. This website by J Al-Ammar is excellent
  2. The next best resource is this annotated implementation of Transformer in PyTorch from Harvard University
  3. Second, read this article called “Attention! Attention!” by Lilian Weng
  4. For further background on Word Embeddings, look into this post by Jason Brownlee.