A mathematical representation of text. When text is fed into an AI model for training or as a prompt to get an answer, it is turned into tokens. When text is generated for the user, the tokens are turned into text.
Text to Tokens and Tokens to Text
The tokenizer turns the text into tokens, which are mathematical representations of a word, part of a word or a phrase. The tokens are the entities that are actually analyzed and processed by the model. When an answer is generated, the detokenizer turns the tokens back into text for human consumption. See
AI transformer.
Example: "This is a token."
Using the sentence "this is a token" as an example and the GPT model as the tokenizer, the sentence is first broken into five tokens and then mapped to a unique token ID, as follows:
["This", " is", " a", "token", ".",]
Token Token ID
"This" 4280
" is" 318
" a" 257
" token" 13298
"." 13