34.【Voice AI Design】Safely Connecting an LLM

Why It Belongs Only in the Thinking State

tags: [“Voice Generation AI”, “LLM”, “FSM”, “Control Structure”, “Generative AI Design”]


🤖 Safely Connecting an LLM

— 🧩 Why It Belongs Only in the Thinking State

Through the previous articles, the voice AI has already been transformed into a stable system:

In other words, the skeleton is complete.

Only now is it safe to
bring the LLM back into the system.


🎯 Purpose of This Article

👉 We are still not building a “chat AI.”
First, we decide on a placement that never causes accidents.


❌ Places Where an LLM Must NOT Be Used (Reconfirmed)

🔊 Speaking

👉 LLMs are too slow


👂 Listening

👉 Semantic understanding is unnecessary


✋ Interrupted

👉 There is no room for an LLM


⭕ The Only Place Where an LLM Is Allowed

🧠 Thinking

The Thinking state:

👉 A perfect match for LLM characteristics.


🧩 The Correct FSM × LLM Structure

🧩 FSM
 ├─ 👂 Listening
 ├─ 🧠 Thinking ──▶ 🤖 LLM (utterance content generation)
 ├─ 🔊 Speaking
 └─ ✋ Interrupted

Key principles:

👉 The LLM is a component, not a controller.


🔌 Information You May — and Must Not — Pass to the LLM

⭕ Safe to Pass

❌ Never Pass

👉 The moment control information is passed, the system breaks.


🧪 Pseudocode (FSM × LLM)

if state == "Thinking":
    text = llm_generate(prompt)
    state = "Speaking"
    play_tts(text)

Key points:


🚫 Common and Dangerous Anti-Patterns

❌ Asking the LLM “What should we do next?”

→ State management collapses

❌ Asking the LLM “Should I keep speaking?”

→ Infinite speech

❌ Letting the LLM decide interruptions

→ Immediate failure


🧠 Why This Structure Is Stable

👉 Responsibility separation is complete.


📌 Summary


🏁 Series Conclusion

In this series, we fixed the structure of voice AI by establishing that:

Even so, many people will still say:

🎭 “I want it to feel more conversational”
🤝 “I want it to feel like talking to a human”

But that is not a design objective — it is a desire for performance.

The minimum requirement for a voice AI to appear intelligent
is not imitation of conversation.

It is simply this:

That is control integrity.

This series ends by
breaking the illusion of conversational AI.

Anything beyond this
can be explored later — if and when it is truly needed.


(The GitHub-hosted Markdown is the canonical source; the Qiita version is a curated extract.)