How Artificial Intelligence Generates Text

An interactive journey from text to tokens to predictions

The Core Idea: Every Token Connects to Every Token

The model stores probability relationships between all 200,000 tokens

Tokens are substrings! "tokenization" → "token" + "ization"

{token}
[10346]
{ization}
[2860]
{emb}
[2072]
{eddings}
[32861]
{transform}
[19692]
{er}
[259]
{under}
[11093]
{standing}
[12138]

Real substring tokenization:

"tokenization" → {token}+{ization}"embeddings" → {emb}+{eddings}"transformer" → {transform}+{er}"understanding" → {under}+{standing}

7 interactive steps · Real tokenization · Live attention visualization