How Artificial Intelligence Generates Text
An interactive journey from text to tokens to predictions
The Core Idea: Every Token Connects to Every Token
The model stores probability relationships between all 200,000 tokens
Tokens are substrings! "tokenization" → "token" + "ization"
{token}
[10346]{ization}
[2860]{emb}
[2072]{eddings}
[32861]{transform}
[19692]{er}
[259]{under}
[11093]{standing}
[12138]Real substring tokenization:
"tokenization" → {token}+{ization}"embeddings" → {emb}+{eddings}"transformer" → {transform}+{er}"understanding" → {under}+{standing}
7 interactive steps · Real tokenization · Live attention visualization