Redirected from: AI sparsity

Definition: AI quantization

Compacting an AI model to run faster. AI quantization is primarily performed at the inference side (user side) so that it can run more quickly in phones and desktop computers. For example, whereas the model's weights (parameters) may be 32-bit floating point numbers in the training stage, they might be reduced to 16-bit floating point or 8-bit integers. Even 4-bit floating point numbers might be used (see FP4).

Pruning and Sparsity
Whereas quantization reduces the parameter values, pruning actually removes parameters, neurons and even layers to make the model more compact. Sparsity changes the parameters to zero to make the model more efficient. See AI training vs. inference, AI weights and biases and floating point.

misc

Term of the Moment

crypto misfortunes

Look Up Another Term

Redirected from: AI sparsity

Definition: AI quantization