24. Understanding LLMs by Structure
Why They Make Us Feel Like We “Understand”
tags: [LLM, AI, Design, Mermaid]
🧭 Introduction
LLMs (Large Language Models) look intelligent.
They can explain things and appear to reason.
But once you look inside, you realize something important:
What they are doing is surprisingly simple.
In this article, we visualize—using structure (Mermaid diagrams)—
- 🔍 What an LLM actually is as a system
- 🧠 Why it appears to understand
- ⚠️ Why it should not be used directly for control or decision-making
🎯 What an LLM Actually Does (Conclusion)
An LLM does only one thing, repeatedly:
It selects the most likely next token based on probability.
It does not understand meaning.
It does not perceive the world.
It is a massive transformer that performs:
Context → Probability → Generation
Nothing more.
🏗 Overall Structure (Inside an LLM)
flowchart TD
A[Input Text] --> B[Tokenization]
B --> C[Embedding<br/>Numeric Vectorization]
C --> D[Transformer Layers × N]
D --> E[Self-Attention]
E --> F[Feed Forward]
F --> G[Next-Token Probability Distribution]
G --> H[Generate One Token]
H -->|Repeat| D
Only three key points matter:
- 📐 Text is converted into numbers
- 🎲 Internally, only probability calculations exist
- 🔁 Tokens are generated one by one
There is no “thinking” step.
🔎 What Self-Attention Really Is
Why It Looks Like Understanding
flowchart TD
T1[Token 1] --> SA[Self-Attention]
T2[Token 2] --> SA
T3[Token 3] --> SA
T4[Token 4] --> SA
SA --> R[Weighted Relationship Representation]
Self-Attention:
- 🧩 Computes relationships between all tokens in a sentence
- ⚡ Does so simultaneously
This allows the model to capture:
- Subject–predicate relationships
- Causality
- Context dependency
regardless of distance.
The result is text that
appears to reflect understanding of context.
🧪 Visualizing “Lack of Understanding”
Internally, something like this is happening:
Input:
"The mechanism of LLMs is"
Internal probabilities:
--------------------------------
"simple" : 0.41
"probabilistic": 0.27
"actually" : 0.12
"complex" : 0.08
others : ...
--------------------------------
→ Select the highest probability
- ❌ No notion of correctness
- ❌ No evaluation of truth
- ⭕ Only probability ranking
The reason the output feels coherent is that
the training data encodes human traces of thought.
🧍 The Fundamental Difference Between Humans and LLMs
flowchart LR
subgraph Human[Human]
H1[World] --> H2[Perception]
H2 --> H3[State]
H3 --> H4[Judgment]
H4 --> H5[Action]
H3 -->|Memory| H3
end
subgraph LLM[LLM]
L1[Text] --> L2[Probability Mapping]
L2 --> L3[Text]
end
An LLM has:
- 🧠 State ❌
- 💾 Memory ❌
- 🎯 Goals ❌
- 🌍 World model ❌
It is simply
a function that maps input text to output text.
🚫 Why LLMs Should Not Be Used for Control or Judgment
The reason is structural:
- Cannot maintain state
- Cannot evaluate correctness
- Cannot preserve temporal continuity
In short:
LLMs lack the components required for control systems
🛠 Where LLMs Are Actually Most Effective
- 📝 Converting vague text into structure
- 📄 Turning logs into candidate causes
- 🧩 Drafting specifications
- 🔍 Explaining differences
They shine in:
Pre-judgment stages
Human-unfriendly intermediate work
✅ Summary
LLM = Intelligence ❌
LLM = Thinking ❌
LLM = Judgment ❌
LLM = Context → Probability → Generation transformer ⭕
It may look like a black box,
but once viewed structurally, its proper placement becomes clear.
LLMs are useful not when you ask
“What can they do?”
but when you understand
“Where should they NOT be placed?”