“Attention Is All You Need” is a seminal 2017 paper by Vaswani et al., introducing the Transformer architecture, which has become the foundation for most modern large language models including BERT, GPT, and others. It replaces recurrence with self-attention, enabling parallelization and significantly improving training efficiency.
Read the Paper