Skip to main content

Overview

The UnmuteHandler class is the core component of the Unmute system that manages the complete voice conversation pipeline. It inherits from AsyncStreamHandler and coordinates audio streaming, speech recognition, language model generation, and text-to-speech synthesis.

Class Definition

class UnmuteHandler(AsyncStreamHandler)

Constructor

def __init__(self) -> None
Initializes the UnmuteHandler with default configuration:
  • input_sample_rate: 24000 Hz
  • output_frame_size: 480 samples
  • output_sample_rate: 24000 Hz

Properties

stt

@property
def stt(self) -> SpeechToText | None
Returns the current SpeechToText instance if available, otherwise None. Returns: SpeechToText | None

tts

@property
def tts(self) -> TextToSpeech | None
Returns the current TextToSpeech instance if available, otherwise None. Returns: TextToSpeech | None

Core Methods

receive

async def receive(self, frame: tuple[int, np.ndarray]) -> None
Processes incoming audio frames from the user’s microphone.
frame
tuple[int, np.ndarray]
required
Audio frame tuple containing sample rate and mono audio array
Behavior:
  • Sends audio to STT for transcription
  • Detects pauses in user speech
  • Handles bot interruptions
  • Manages conversation state transitions

emit

async def emit(self) -> HandlerOutput | None
Returns output to be sent to the client (audio, events, or additional outputs). Returns: tuple[int, np.ndarray] | AdditionalOutputs | ora.ServerEvent | CloseStream | None

start_up

async def start_up(self)
Initializes the handler by starting up the STT service and preparing for audio processing.

cleanup

async def cleanup()
Cleans up resources, including shutting down the recorder if enabled.

Conversation Management

add_chat_message_delta

async def add_chat_message_delta(
    self,
    delta: str,
    role: Literal["user", "assistant"],
    generating_message_i: int | None = None,
) -> bool
Adds a message delta to the chat history.
delta
str
required
Text to add to the current message
role
Literal['user', 'assistant']
required
Role of the message sender
generating_message_i
int | None
Index of the message being generated (to avoid race conditions)
Returns: bool - True if this created a new message, False if appended to existing

interrupt_bot

async def interrupt_bot()
Interrupts the bot while it’s speaking, clearing queues and stopping TTS/LLM tasks. Raises: RuntimeError if called when conversation state is not “bot_speaking”

update_session

async def update_session(self, session: ora.SessionConfig)
Updates session configuration including instructions, voice, and recording preferences.
session
ora.SessionConfig
required
Session configuration object

Utility Methods

audio_received_sec

def audio_received_sec(self) -> float
Returns how much audio has been received in seconds. Used for timing instead of wall-clock time. Returns: float - Seconds of audio received

get_gradio_update

def get_gradio_update(self) -> AdditionalOutputs
Returns debug information and chat history for UI updates. Returns: AdditionalOutputs containing GradioUpdate with:
  • chat_history: List of message dictionaries
  • debug_dict: Debug information dictionary
  • debug_plot_data: Plot data for visualizations

copy

def copy(self)
Creates a new instance of UnmuteHandler. Returns: UnmuteHandler

Internal Methods

determine_pause

def determine_pause(self) -> bool
Determines if the user has paused speaking based on STT pause prediction. Returns: bool - True if pause detected

detect_long_silence

async def detect_long_silence()
Handles situations where the user doesn’t respond for more than 7 seconds by inserting a silence marker.

check_for_bot_goodbye

async def check_for_bot_goodbye()
Checks if the assistant’s last message ends with “bye!” and closes the stream if so.

Configuration Constants

USER_SILENCE_TIMEOUT = 7.0  # seconds
FIRST_MESSAGE_TEMPERATURE = 0.7
FURTHER_MESSAGES_TEMPERATURE = 0.3
UNINTERRUPTIBLE_BY_VAD_TIME_SEC = 3  # seconds

Example Usage

import asyncio
from unmute.unmute_handler import UnmuteHandler

async def main():
    handler = UnmuteHandler()
    
    async with handler:
        await handler.start_up()
        
        # Handler is now ready to process audio frames
        # Audio frames would be received via the receive() method
        # and outputs retrieved via emit()
        
        await handler.cleanup()

asyncio.run(main())

Notes

  • The handler uses a quest manager system to coordinate async tasks for STT, TTS, and LLM
  • Conversation state is managed through the internal Chatbot instance
  • Audio is processed at 24kHz sample rate with mono channel
  • The handler supports echo cancellation and VAD-based interruption detection

Build docs developers (and LLMs) love