Term of the Moment

app


Look Up Another Term


Redirected from: AI sparcity

Definition: AI quantization


Compacting an AI model to run faster. AI quantization is primarily performed at the inference side (user side) so that it can run more quickly in phones and desktop computers. For example, whereas the model's weights (parameters) may be 32-bit floating point numbers in the training stage, they might be reduced to 16-bit floating point or 8-bit integers. Even 4-bit floating point numbers might be used (see FP4).

Pruning and Sparcity
Whereas quantization reduces the parameter values, pruning actually removes parameters and/or neurons to make the model more compact. Sparcity changes the parameters to zero to make the model more efficient. See AI training vs. inference, AI weights and biases and floating point.