Agentic Middleware

llmovoice

A practical direction for real-time voice-native LLM systems that are scalable, natural, and robust in production conditions.

Overview

Voice is becoming a primary interface for LLMs. This paper explores what it takes to make voice interaction work well beyond demos, with a focus on long-running calls, expressive response style, and stable behavior under imperfect network conditions.

Core Goals

Scalability

Sustain long voice conversations without exhausting context windows or making per-call costs impractical.

Naturalness

Adapt pace and speaking style so the agent sounds conversational rather than flat and robotic.

Robustness

Handle noisy real-world audio and network jitter while keeping turn-taking stable and reliable.