RLHF (reinforcement learning from human feedback)

Learning Methods

Letter: R

A post-training technique to align AI models with human preferences using feedback.

Detailed Definition

RLHF is a post-training technique that goes beyond next-token prediction and fine-tuning by teaching AI models to behave the way humans want them to—making them safer, more helpful, and aligned with our intentions. This process works in two stages: First, human evaluators compare pairs of outputs and choose which is better, training a "reward model" that learns to predict human preferences. Then, the AI model learns through reinforcement learning by trying to generate responses that the reward model predicts humans would prefer.

RLHF (reinforcement learning from human feedback)

Detailed Definition

Learning Methods
More in this Category

Few-Shot Learning

Fine-tuning

Post-training

Training/Pre-training

Supervised learning

Unsupervised learning

RLHF (reinforcement learning from human feedback)

Detailed Definition

Learning MethodsMore in this Category

Few-Shot Learning

Fine-tuning

Post-training

Training/Pre-training

Supervised learning

Unsupervised learning

Learning Methods
More in this Category