Text-to-Speech (TTS)

Overview

Klaus uses OpenAI’s gpt-4o-mini-tts model for high-quality neural text-to-speech. Responses are batched into sentences and streamed for low-latency playback.

TextToSpeech Class

The TextToSpeech class converts text to speech with sentence-level streaming and persistent audio output.

Constructor

from klaus.tts import TextToSpeech
import klaus.config as config

tts = TextToSpeech(settings=None)

settings

config.RuntimeSettings | None

default:"None"

Optional runtime settings. If None, reads from config.get_runtime_settings().

Methods

`speak(text: str, on_sentence_start: callable = None) -> None`

Synthesize and play text. Batches into sentences for low-latency playback.

text

str

required

The full text to speak

on_sentence_start

callable

default:"None"

Optional callback (sentence_index: int, sentence_text: str) fired before each sentence begins playback

Example:

from klaus.tts import TextToSpeech

tts = TextToSpeech()

def on_sentence(idx, text):
    print(f"Sentence {idx}: {text[:50]}...")

tts.speak(
    "Hello. This is Klaus speaking.",
    on_sentence_start=on_sentence
)

`speak_streaming(sentence_queue: queue.Queue[str | None]) -> None`

Play sentences as they arrive from a queue. Reads sentences from sentence_queue, synthesizes each via the API, and plays them sequentially. None in the queue signals completion.

sentence_queue

queue.Queue[str | None]

required

Queue of sentences to synthesize and play. Push None to signal end of stream.

Example:

import queue
from klaus.tts import TextToSpeech

tts = TextToSpeech()
sentence_queue = queue.Queue()

# Producer thread
sentence_queue.put("First sentence.")
sentence_queue.put("Second sentence.")
sentence_queue.put(None)  # Signal completion

tts.speak_streaming(sentence_queue)

`synthesize_to_wav(text: str) -> bytes`

Synthesize text to a single WAV buffer without playing it.

text

str

required

Text to synthesize

Returns: WAV-encoded audio bytes Example:

from klaus.tts import TextToSpeech

tts = TextToSpeech()
wav_bytes = tts.synthesize_to_wav("Hello, world!")

with open("output.wav", "wb") as f:
    f.write(wav_bytes)

`stop() -> None`

Immediately stop playback and close the audio stream. Example:

import threading
from klaus.tts import TextToSpeech

tts = TextToSpeech()

def speak_in_background():
    tts.speak("This is a long sentence that can be interrupted.")

thread = threading.Thread(target=speak_in_background)
thread.start()

# Later, interrupt playback
tts.stop()

`reload_client(settings: config.RuntimeSettings | None = None) -> None`

Recreate the OpenAI client to pick up API key changes from config.reload().

settings

config.RuntimeSettings | None

default:"None"

New runtime settings. If None, reads from config.get_runtime_settings().

Configuration

TTS settings are configured in ~/.klaus/config.toml:

[tts]
voice = "cedar"        # alloy, ash, ballad, coral, cedar, sage, shimmer, verse
speed = 1.0            # 0.25 to 4.0
model = "gpt-4o-mini-tts"

Available Voices

alloy
ash
ballad
coral
cedar (default)
sage
shimmer
verse

Implementation Details

Sentence batching: Responses are split on sentence boundaries (., !, ?) and synthesized in chunks up to 4000 characters.
Streaming playback: Audio begins playing as soon as the first sentence is synthesized.
Persistent output stream: A single sounddevice.OutputStream is reused across chunks to avoid macOS CoreAudio crackling.
High latency mode on macOS: Uses latency='high' on macOS for stable playback.
Thread-safe: Synthesis and playback run in background threads.

Constants

SENTENCE_SPLIT = re.compile(r'(?<=[.!?])\s+')
MAX_CHUNK_CHARS = 4000
WRITE_BLOCK_FRAMES = 2048

Source Reference

See klaus/tts.py for the full implementation.

Core Modules

Services

UI Components

Text-to-Speech (TTS)

Overview

TextToSpeech Class

Constructor

Methods

`speak(text: str, on_sentence_start: callable = None) -> None`

`speak_streaming(sentence_queue: queue.Queue[str | None]) -> None`

`synthesize_to_wav(text: str) -> bytes`

`stop() -> None`

`reload_client(settings: config.RuntimeSettings | None = None) -> None`

Configuration

Available Voices

Implementation Details

Constants

Source Reference

Build docs developers (and LLMs) love

Core Modules

Services

UI Components

​Overview

​TextToSpeech Class

​Constructor

​Methods

​speak(text: str, on_sentence_start: callable = None) -> None

​speak_streaming(sentence_queue: queue.Queue[str | None]) -> None

​synthesize_to_wav(text: str) -> bytes

​stop() -> None

​reload_client(settings: config.RuntimeSettings | None = None) -> None

​Configuration

​Available Voices

​Implementation Details

​Constants

​Source Reference

Build docs developers (and LLMs) love

Overview

TextToSpeech Class

Constructor

Methods

`speak(text: str, on_sentence_start: callable = None) -> None`

`speak_streaming(sentence_queue: queue.Queue[str | None]) -> None`

`synthesize_to_wav(text: str) -> bytes`

`stop() -> None`

`reload_client(settings: config.RuntimeSettings | None = None) -> None`

Configuration

Available Voices

Implementation Details

Constants

Source Reference