Logo

Transformer

Core Technologies
Letter: T

The neural-network architecture that introduced ‘attention’ and enabled today’s LLM revolution.

Detailed Definition

The Transformer architecture, introduced by Google researchers in 2017, replaced recurrent and convolutional networks in NLP by letting models attend to every token in parallel. Its self-attention mechanism captures long-range dependencies, while massive parallelism makes large-scale training feasible. Virtually every modern language, vision, and multi-modal model — GPT, Claude, Gemini, Llama — builds on Transformers or their derivatives.