Definition: RLHF

(Reinforcement Learning from Human Feedback) A machine learning technique used in developing AI chatbots and agents. In reinforcement learning (RL), models are not given labeled data. They take an action and receive a reward or penalty automatically based on the output. In RL with human feedback (RLHF), AI engineers and annotation specialists use their own judgment to rank the outputs.

After a model is pretrained, both RL and RLHF refine the model by feeding prompts and responding to the answers. However, RLHF is used to eliminate destructive, insulting and vulgar answers. It is also used to add interactive responses such as "that's a very good question."

Human fine tuning is an essential stage in developing models that deliver answers in language people relate to. While pretraining large models takes months, RL and RLHF take weeks. Reinforcement learning is especially important in developing AI agents because agents can take action on their own without being prompted by a person. See AI training vs. inference and AI agent.

misc

Term of the Moment

headphones

Look Up Another Term

Definition: RLHF