Audio

PushToTalkRecorder

Records audio while a key is held, producing a WAV buffer on release.

Constructor

PushToTalkRecorder(sample_rate: int = SAMPLE_RATE)

sample_rate

int

default:"16000"

Audio sample rate in Hz

Location: audio.py:36

Methods

start_recording()

Begin capturing audio from the microphone.

def start_recording() -> None:

Location: audio.py:43-63

stop_recording()

Stop recording and return WAV bytes, or None if nothing was captured.

def stop_recording() -> bytes | None:

wav_bytes

bytes | None

WAV-encoded audio data, or None if no audio was captured

Location: audio.py:77-95

Properties

is_recording

bool

Whether the recorder is currently capturing audio

Location: audio.py:97-99

VoiceActivatedRecorder

Continuously listens and uses webrtcvad to detect speech boundaries.

Constructor

VoiceActivatedRecorder(
    on_speech_start: Callable[[], None],
    on_speech_end: Callable[[bytes], None],
    on_speech_discard: Callable[[str], None] | None = None,
    sample_rate: int = SAMPLE_RATE,
    sensitivity: int = 2,
    silence_timeout: float = 1.5,
    min_voiced_ratio: float = 0.25,
    min_voiced_frames: int = 5,
    min_duration: float = 0.3,
    min_rms_dbfs: float = -45.0,
    min_voiced_run_frames: int = 6,
    device: int | None = None,
)

on_speech_start

Callable[[], None]

required

Callback invoked when speech begins

on_speech_end

Callable[[bytes], None]

required

Callback invoked with WAV bytes when speech ends

on_speech_discard

Callable[[str], None] | None

default:"None"

Callback invoked with reason when utterance is discarded

sample_rate

int

default:"16000"

Audio sample rate in Hz

sensitivity

int

default:"2"

WebRTC VAD sensitivity (0-3, higher = more aggressive filtering)

silence_timeout

float

default:"1.5"

Seconds of silence before finalizing speech

min_voiced_ratio

float

default:"0.25"

Minimum ratio of voiced frames to total frames

min_voiced_frames

int

default:"5"

Minimum number of voiced frames required

min_duration

float

default:"0.3"

Minimum utterance duration in seconds

min_rms_dbfs

float

default:"-45.0"

Minimum RMS loudness in dBFS

min_voiced_run_frames

int

default:"6"

Minimum contiguous voiced run of 30ms frames

device

int | None

default:"None"

Input device index, or None for system default

Location: audio.py:105-155

Methods

start()

Open the mic stream and begin VAD detection.

def start() -> None:

Location: audio.py:157-202

stop()

Stop the stream and discard any in-progress speech.

def stop() -> None:

Location: audio.py:204-219

pause()

Pause detection (e.g. while Klaus is speaking).

def pause() -> None:

Location: audio.py:221-232

resume()

Resume detection after pause.

def resume() -> None:

Location: audio.py:234-243

suspend_stream()

Stop the physical mic stream. Safe to call from non-callback threads. Use this (instead of pause) when you need to free the CoreAudio device, e.g. before TTS playback. Call resume_stream() to reopen.

def suspend_stream() -> None:

Location: audio.py:245-258

resume_stream()

Reopen the physical mic stream after suspend_stream().

def resume_stream() -> None:

Location: audio.py:260-276

Properties

is_running

bool

Whether the VAD is actively listening

is_paused

bool

Whether detection is paused

Location: audio.py:443-449

AudioPlayer

Plays raw PCM or WAV audio through the default output device.

Constructor

AudioPlayer(sample_rate: int = 24000)

sample_rate

int

default:"24000"

Audio sample rate in Hz

Location: audio.py:455

Methods

play_wav_bytes()

Play a complete WAV buffer. Blocks until playback finishes or stop() is called.

def play_wav_bytes(wav_data: bytes) -> None:

wav_data

bytes

required

WAV-encoded audio data

Location: audio.py:460-481

stop()

Stop playback immediately.

def stop() -> None:

Location: audio.py:483-485

Utility Functions

to_wav_bytes()

Convert int16 numpy audio to WAV bytes.

def to_wav_bytes(audio: np.ndarray, sample_rate: int = SAMPLE_RATE) -> bytes:

audio

np.ndarray

required

Audio samples as int16 numpy array

sample_rate

int

default:"16000"

Sample rate in Hz

wav_bytes

bytes

WAV-encoded audio data

Location: audio.py:22-30

Constants

SAMPLE_RATE = 16000
CHANNELS = 1
DTYPE = "int16"
FRAME_DURATION_MS = 30
FRAME_SIZE = 480  # samples @ 16 kHz

Location: audio.py:14-19

Core Modules

Services

UI Components

PushToTalkRecorder

Constructor

Methods

start_recording()

stop_recording()

Properties

VoiceActivatedRecorder

Constructor

Methods

start()

stop()

pause()

resume()

suspend_stream()

resume_stream()

Properties

AudioPlayer

Constructor

Methods

play_wav_bytes()

stop()

Utility Functions

to_wav_bytes()

Constants

Build docs developers (and LLMs) love

Core Modules

Services

UI Components

​PushToTalkRecorder

​Constructor

​Methods

​start_recording()

​stop_recording()

​Properties

​VoiceActivatedRecorder

​Constructor

​Methods

​start()

​stop()

​pause()

​resume()

​suspend_stream()

​resume_stream()

​Properties

​AudioPlayer

​Constructor

​Methods

​play_wav_bytes()

​stop()

​Utility Functions

​to_wav_bytes()

​Constants

Build docs developers (and LLMs) love

PushToTalkRecorder

Constructor

Methods

start_recording()

stop_recording()

Properties

VoiceActivatedRecorder

Constructor

Methods

start()

stop()

pause()

resume()

suspend_stream()

resume_stream()

Properties

AudioPlayer

Constructor

Methods

play_wav_bytes()

stop()

Utility Functions

to_wav_bytes()

Constants