Skip to main content

PushToTalkRecorder

Records audio while a key is held, producing a WAV buffer on release.

Constructor

PushToTalkRecorder(sample_rate: int = SAMPLE_RATE)
sample_rate
int
default:"16000"
Audio sample rate in Hz
Location: audio.py:36

Methods

start_recording()

Begin capturing audio from the microphone.
def start_recording() -> None:
Location: audio.py:43-63

stop_recording()

Stop recording and return WAV bytes, or None if nothing was captured.
def stop_recording() -> bytes | None:
wav_bytes
bytes | None
WAV-encoded audio data, or None if no audio was captured
Location: audio.py:77-95

Properties

is_recording
bool
Whether the recorder is currently capturing audio
Location: audio.py:97-99

VoiceActivatedRecorder

Continuously listens and uses webrtcvad to detect speech boundaries.

Constructor

VoiceActivatedRecorder(
    on_speech_start: Callable[[], None],
    on_speech_end: Callable[[bytes], None],
    on_speech_discard: Callable[[str], None] | None = None,
    sample_rate: int = SAMPLE_RATE,
    sensitivity: int = 2,
    silence_timeout: float = 1.5,
    min_voiced_ratio: float = 0.25,
    min_voiced_frames: int = 5,
    min_duration: float = 0.3,
    min_rms_dbfs: float = -45.0,
    min_voiced_run_frames: int = 6,
    device: int | None = None,
)
on_speech_start
Callable[[], None]
required
Callback invoked when speech begins
on_speech_end
Callable[[bytes], None]
required
Callback invoked with WAV bytes when speech ends
on_speech_discard
Callable[[str], None] | None
default:"None"
Callback invoked with reason when utterance is discarded
sample_rate
int
default:"16000"
Audio sample rate in Hz
sensitivity
int
default:"2"
WebRTC VAD sensitivity (0-3, higher = more aggressive filtering)
silence_timeout
float
default:"1.5"
Seconds of silence before finalizing speech
min_voiced_ratio
float
default:"0.25"
Minimum ratio of voiced frames to total frames
min_voiced_frames
int
default:"5"
Minimum number of voiced frames required
min_duration
float
default:"0.3"
Minimum utterance duration in seconds
min_rms_dbfs
float
default:"-45.0"
Minimum RMS loudness in dBFS
min_voiced_run_frames
int
default:"6"
Minimum contiguous voiced run of 30ms frames
device
int | None
default:"None"
Input device index, or None for system default
Location: audio.py:105-155

Methods

start()

Open the mic stream and begin VAD detection.
def start() -> None:
Location: audio.py:157-202

stop()

Stop the stream and discard any in-progress speech.
def stop() -> None:
Location: audio.py:204-219

pause()

Pause detection (e.g. while Klaus is speaking).
def pause() -> None:
Location: audio.py:221-232

resume()

Resume detection after pause.
def resume() -> None:
Location: audio.py:234-243

suspend_stream()

Stop the physical mic stream. Safe to call from non-callback threads. Use this (instead of pause) when you need to free the CoreAudio device, e.g. before TTS playback. Call resume_stream() to reopen.
def suspend_stream() -> None:
Location: audio.py:245-258

resume_stream()

Reopen the physical mic stream after suspend_stream().
def resume_stream() -> None:
Location: audio.py:260-276

Properties

is_running
bool
Whether the VAD is actively listening
is_paused
bool
Whether detection is paused
Location: audio.py:443-449

AudioPlayer

Plays raw PCM or WAV audio through the default output device.

Constructor

AudioPlayer(sample_rate: int = 24000)
sample_rate
int
default:"24000"
Audio sample rate in Hz
Location: audio.py:455

Methods

play_wav_bytes()

Play a complete WAV buffer. Blocks until playback finishes or stop() is called.
def play_wav_bytes(wav_data: bytes) -> None:
wav_data
bytes
required
WAV-encoded audio data
Location: audio.py:460-481

stop()

Stop playback immediately.
def stop() -> None:
Location: audio.py:483-485

Utility Functions

to_wav_bytes()

Convert int16 numpy audio to WAV bytes.
def to_wav_bytes(audio: np.ndarray, sample_rate: int = SAMPLE_RATE) -> bytes:
audio
np.ndarray
required
Audio samples as int16 numpy array
sample_rate
int
default:"16000"
Sample rate in Hz
wav_bytes
bytes
WAV-encoded audio data
Location: audio.py:22-30

Constants

SAMPLE_RATE = 16000
CHANNELS = 1
DTYPE = "int16"
FRAME_DURATION_MS = 30
FRAME_SIZE = 480  # samples @ 16 kHz
Location: audio.py:14-19

Build docs developers (and LLMs) love