Level 4 · Expert · 9 min

RLHF

Reinforcement learning from human feedback. The training stage that turns a base model into a helpful assistant.