Synthetic data
Data & Knowledge
Letter: S
Artificially generated data that mimics the patterns and structure of real-world data.
Detailed Definition
Synthetic data is artificially generated data used to train AI models, especially when real data is limited, sensitive, or exhausted. It follows the same patterns and structure as human-generated data and can be just as effective at helping models learn. For example, LLMs can generate fictional customer support chats, or diffusion models can create realistic images for training. To a human, synthetic data can often be indistinguishable from real data.