33.【Voice AI Design】Interruptions Break Voice AI

What Really Happens When a Human Speaks During Speaking

tags: [“Voice Generation AI”, “FSM”, “Interruption”, “Design Philosophy”, “Control Structure”]


✋ Interruptions Break Voice AI

— 🔥 What Happens When a Human Speaks While the AI Is Speaking

This is the moment where voice AI fails most dramatically.

🔊 The AI is speaking
👤 A human starts speaking

Most voice AI demos
deliberately avoid this moment.

The reason is simple:

They were never designed for it.


🎯 Purpose of This Article

👉 We still don’t need conversational polish.
First, it must not break.


❌ Common Failure Patterns

① 🔊 Keeps Talking

② 🧊 Silent Freeze

③ 🤯 Dual State

👉 All of these are caused by undefined state transitions.


🧠 Interruption Is Not an “Error”

A critical perspective:

Interruption is normal behavior

In human conversation:

happen constantly.

Designing voice AI under the assumption
that it will never be interrupted
is guaranteed failure.


🧩 Adding an Interruption State to the FSM

We extend the FSM from article (32)
by adding a dedicated interruption state.

📦 Added States


🔄 FSM with Interruption (Full View)

💤 Idle
  │
  ▼
🔊 Speaking
  │ (Human speech detected)
  ▼
✋ Interrupted
  │
  ├──▶ 👂 Listening   (Listen to the human)
  └──▶ 💤 Idle        (Abort and stop)

🔑 Key points:


⏱ Do NOT Be “Smart” About Interruption Detection

A common mistake here:

The correct approach:

🎚 Decide immediately using physical signals

Examples:

👉 Meaning comes later.


🔊 What Must Happen During Speaking Interruption

The moment interruption is detected:

  1. 🔇 Immediately stop TTS
  2. 🧩 Transition FSM to Interrupted
  3. 🎤 Separate output from input
  4. 👂 Move to Listening

Changing this order causes failure.


🚫 What NOT to Do During Interruption Handling

👉 That is performance, not control.


🧠 The Biggest Benefit of an Interruption FSM

🎯 This alone places the system
within the minimum acceptable range for real-world voice AI.


✋ Interruption Demo

A demo is provided to observe
interruption behavior under FSM control.

▶ Demo
https://samizo-aitl.github.io/qiita-articles/demos/audio-fsm-interrupt/

This demo demonstrates:


📌 Summary


🔜 Next Article (34)

Next time,
the LLM finally returns — safely.


(The GitHub-hosted Markdown is the canonical source; the Qiita version is a curated extract.)