UnmuteHandler

Overview

The UnmuteHandler class is the core component of the Unmute system that manages the complete voice conversation pipeline. It inherits from AsyncStreamHandler and coordinates audio streaming, speech recognition, language model generation, and text-to-speech synthesis.

Class Definition

class UnmuteHandler(AsyncStreamHandler)

Constructor

def __init__(self) -> None

Initializes the UnmuteHandler with default configuration:

input_sample_rate: 24000 Hz
output_frame_size: 480 samples
output_sample_rate: 24000 Hz

Properties

stt

@property
def stt(self) -> SpeechToText | None

Returns the current SpeechToText instance if available, otherwise None. Returns: SpeechToText | None

tts

@property
def tts(self) -> TextToSpeech | None

Returns the current TextToSpeech instance if available, otherwise None. Returns: TextToSpeech | None

Core Methods

receive

async def receive(self, frame: tuple[int, np.ndarray]) -> None

Processes incoming audio frames from the user’s microphone.

frame

tuple[int, np.ndarray]

required

Audio frame tuple containing sample rate and mono audio array

Behavior:

Sends audio to STT for transcription
Detects pauses in user speech
Handles bot interruptions
Manages conversation state transitions

emit

async def emit(self) -> HandlerOutput | None

Returns output to be sent to the client (audio, events, or additional outputs). Returns: tuple[int, np.ndarray] | AdditionalOutputs | ora.ServerEvent | CloseStream | None

start_up

async def start_up(self)

Initializes the handler by starting up the STT service and preparing for audio processing.

cleanup

async def cleanup()

Cleans up resources, including shutting down the recorder if enabled.

Conversation Management

add_chat_message_delta

async def add_chat_message_delta(
    self,
    delta: str,
    role: Literal["user", "assistant"],
    generating_message_i: int | None = None,
) -> bool

Adds a message delta to the chat history.

delta

str

required

Text to add to the current message

role

Literal['user', 'assistant']

required

Role of the message sender

generating_message_i

int | None

Index of the message being generated (to avoid race conditions)

Returns: bool - True if this created a new message, False if appended to existing

interrupt_bot

async def interrupt_bot()

Interrupts the bot while it’s speaking, clearing queues and stopping TTS/LLM tasks. Raises: RuntimeError if called when conversation state is not “bot_speaking”

update_session

async def update_session(self, session: ora.SessionConfig)

Updates session configuration including instructions, voice, and recording preferences.

session

ora.SessionConfig

required

Session configuration object

Utility Methods

audio_received_sec

def audio_received_sec(self) -> float

Returns how much audio has been received in seconds. Used for timing instead of wall-clock time. Returns: float - Seconds of audio received

get_gradio_update

def get_gradio_update(self) -> AdditionalOutputs

Returns debug information and chat history for UI updates. Returns: AdditionalOutputs containing GradioUpdate with:

chat_history: List of message dictionaries
debug_dict: Debug information dictionary
debug_plot_data: Plot data for visualizations

copy

def copy(self)

Creates a new instance of UnmuteHandler. Returns: UnmuteHandler

Internal Methods

determine_pause

def determine_pause(self) -> bool

Determines if the user has paused speaking based on STT pause prediction. Returns: bool - True if pause detected

detect_long_silence

async def detect_long_silence()

Handles situations where the user doesn’t respond for more than 7 seconds by inserting a silence marker.

check_for_bot_goodbye

async def check_for_bot_goodbye()

Checks if the assistant’s last message ends with “bye!” and closes the stream if so.

Configuration Constants

USER_SILENCE_TIMEOUT = 7.0  # seconds
FIRST_MESSAGE_TEMPERATURE = 0.7
FURTHER_MESSAGES_TEMPERATURE = 0.3
UNINTERRUPTIBLE_BY_VAD_TIME_SEC = 3  # seconds

Example Usage

import asyncio
from unmute.unmute_handler import UnmuteHandler

async def main():
    handler = UnmuteHandler()
    
    async with handler:
        await handler.start_up()
        
        # Handler is now ready to process audio frames
        # Audio frames would be received via the receive() method
        # and outputs retrieved via emit()
        
        await handler.cleanup()

asyncio.run(main())

Notes

The handler uses a quest manager system to coordinate async tasks for STT, TTS, and LLM
Conversation state is managed through the internal Chatbot instance
Audio is processed at 24kHz sample rate with mono channel
The handler supports echo cancellation and VAD-based interruption detection

WebSocket API

Python API

REST API

Overview

Class Definition

Constructor

Properties

stt

tts

Core Methods

receive

emit

start_up

cleanup

Conversation Management

add_chat_message_delta

interrupt_bot

update_session

Utility Methods

audio_received_sec

get_gradio_update

copy

Internal Methods

determine_pause

detect_long_silence

check_for_bot_goodbye

Configuration Constants

Example Usage

Notes

Build docs developers (and LLMs) love

WebSocket API

Python API

REST API

​Overview

​Class Definition

​Constructor

​Properties

​stt

​tts

​Core Methods

​receive

​emit

​start_up

​cleanup

​Conversation Management

​add_chat_message_delta

​interrupt_bot

​update_session

​Utility Methods

​audio_received_sec

​get_gradio_update

​copy

​Internal Methods

​determine_pause

​detect_long_silence

​check_for_bot_goodbye

​Configuration Constants

​Example Usage

​Notes

Build docs developers (and LLMs) love

Overview

Class Definition

Constructor

Properties

stt

tts

Core Methods

receive

emit

start_up

cleanup

Conversation Management

add_chat_message_delta

interrupt_bot

update_session

Utility Methods

audio_received_sec

get_gradio_update

copy

Internal Methods

determine_pause

detect_long_silence

check_for_bot_goodbye

Configuration Constants

Example Usage

Notes