Transformer
The neural-network architecture that introduced ‘attention’ and enabled today’s LLM revolution.
Detailed Definition
The Transformer architecture, introduced by Google researchers in 2017, replaced recurrent and convolutional networks in NLP by letting models attend to every token in parallel. Its self-attention mechanism captures long-range dependencies, while massive parallelism makes large-scale training feasible. Virtually every modern language, vision, and multi-modal model — GPT, Claude, Gemini, Llama — builds on Transformers or their derivatives.
Core TechnologiesMore in this Category
Autoregressive Model
A type of model that predicts the next element in a sequence based on previous elements.
BERT
Bidirectional Encoder Representations from Transformers - a pre-trained language model.
Deep Learning
A subset of machine learning using neural networks with multiple layers to learn complex patterns.
Embedding
A numerical representation of data that captures semantic meaning in a high-dimensional vector space.
GPT (Generative Pre-trained Transformer)
A family of language models that generate human-like text using transformer architecture.