32.【Voice AI Design】Connecting Minimal Audio to an FSM

A First Utterance That Doesn’t Break — by Abandoning “Naturalness”

tags: [“Voice Generation AI”, “FSM”, “TTS”, “Design Philosophy”, “Generative AI”]


🔊 Connecting Minimal Audio to an FSM

— 🧪 A First Utterance That Doesn’t Break by Abandoning “Naturalness”

In the previous article (31), we decomposed voice AI and concluded that its true nature is an
FSM (Finite State Machine).

In this article, we connect the smallest possible audio output to that FSM.

⚠️ The goal here is not to sound natural.
🎯 The goal is to produce sound without breaking, and always return safely.


🎯 Goal of This Article


🧩 Overall Minimal Structure

🧩 FSM
 ├─ 🎤 Audio Input (dummy in this article)
 ├─ 🔊 TTS (minimal)
 └─ ⏱ Timer / Completion Event

👉 Audio is subordinate to the FSM.
It must never speak on its own.


📦 FSM Involvement Points (Reconfirmed)

Unless the FSM explicitly allows it,
audio will not start, continue, or replay.


🔊 The Philosophy of Minimal TTS

We completely ignore “quality” at this stage.

Example:

“System has started.”

That is more than enough.


🧪 Pseudocode (FSM × Minimal Audio)

state = "Idle"

def on_event(event):
    global state

    if state == "Idle" and event == "start":
        state = "Speaking"
        play_tts("System has started.")

    elif state == "Speaking" and event == "tts_finished":
        state = "Idle"

🔑 Key points:


⏱ Never Decide Speech Completion by “Feeling”

❌ Common mistakes

⭕ Correct approach

👉 The FSM decides the transition.


✋ Why Interruptions Are NOT Added Yet

Adding interruptions at this stage would:

This article intentionally keeps things simple.

Interruptions come next time (33).


🚫 What You Must NOT Do at This Stage

👉 All of that comes after the FSM is complete.


🧠 What You Gain at This Stage

🎉 With just this,
the system is already more robust than most voice AI demos.


📌 Summary


🔊 Minimal Demo

A minimal demo is provided where sound is produced
only when the FSM allows it.

▶ Demo
https://samizo-aitl.github.io/qiita-articles/demos/audio-fsm-minimal/

This demo verifies that:


🔜 Next Article (33)

Next is the hell chapter.
We handle interruptions.


(The GitHub-hosted Markdown is the canonical source; the Qiita version is a curated extract.)