The method used to train a large language model (LLM). An AI model's neural network learns by recognizing patterns in the data and constantly predicting what comes next. With regard to text models, words are turned into tokens that are mathematical representations and it is the next token that is predicted. For images, audio and video, the predictions are the next group of pixels, time slices or frames respectively. See
AI token.
Step 1 - Forward and Compare
Using text and their token representations as the example, backpropagation starts with a forward pass through the data and making predictions. The difference between the predicted next token (next word) and the actual next token is computed, and this "error" is known as the "loss computation."
Step 2 - Backward and Adjust Gradients
Backpropagation means going from the end to the beginning of the network and adjusting the gradients by increasing or decreasing the weights and biases between neurons in every layer. Backpropagation discovers the errors, and the "gradient descent" optimizer function does the updating. See
AI weights and biases.
Quadrillions and Quintillions of Changes
The process is repeated over and over until the model achieves the desired outcome. Large language models that have trillions of tokens can perform the backpropagation step millions of times, which means the total number of gradient changes to the neural network is an astronomical number. See
AI weights and biases and
space/time.
Adjust the Weights and Biases
Backpropagation constantly adjusts the mathematical values between all the neurons. In a large language model, there can be millions and billions of neurons. See
neural network and
AI weights and biases.