Overview
The UnmuteHandler class is the core component of the Unmute system that manages the complete voice conversation pipeline. It inherits from AsyncStreamHandler and coordinates audio streaming, speech recognition, language model generation, and text-to-speech synthesis.
Class Definition
class UnmuteHandler(AsyncStreamHandler)
Constructor
def __init__(self) -> None
Initializes the UnmuteHandler with default configuration:
- input_sample_rate: 24000 Hz
- output_frame_size: 480 samples
- output_sample_rate: 24000 Hz
Properties
stt
@property
def stt(self) -> SpeechToText | None
Returns the current SpeechToText instance if available, otherwise None.
Returns: SpeechToText | None
tts
@property
def tts(self) -> TextToSpeech | None
Returns the current TextToSpeech instance if available, otherwise None.
Returns: TextToSpeech | None
Core Methods
receive
async def receive(self, frame: tuple[int, np.ndarray]) -> None
Processes incoming audio frames from the user’s microphone.
frame
tuple[int, np.ndarray]
required
Audio frame tuple containing sample rate and mono audio array
Behavior:
- Sends audio to STT for transcription
- Detects pauses in user speech
- Handles bot interruptions
- Manages conversation state transitions
emit
async def emit(self) -> HandlerOutput | None
Returns output to be sent to the client (audio, events, or additional outputs).
Returns: tuple[int, np.ndarray] | AdditionalOutputs | ora.ServerEvent | CloseStream | None
start_up
Initializes the handler by starting up the STT service and preparing for audio processing.
cleanup
Cleans up resources, including shutting down the recorder if enabled.
Conversation Management
add_chat_message_delta
async def add_chat_message_delta(
self,
delta: str,
role: Literal["user", "assistant"],
generating_message_i: int | None = None,
) -> bool
Adds a message delta to the chat history.
Text to add to the current message
role
Literal['user', 'assistant']
required
Role of the message sender
Index of the message being generated (to avoid race conditions)
Returns: bool - True if this created a new message, False if appended to existing
interrupt_bot
async def interrupt_bot()
Interrupts the bot while it’s speaking, clearing queues and stopping TTS/LLM tasks.
Raises: RuntimeError if called when conversation state is not “bot_speaking”
update_session
async def update_session(self, session: ora.SessionConfig)
Updates session configuration including instructions, voice, and recording preferences.
session
ora.SessionConfig
required
Session configuration object
Utility Methods
audio_received_sec
def audio_received_sec(self) -> float
Returns how much audio has been received in seconds. Used for timing instead of wall-clock time.
Returns: float - Seconds of audio received
get_gradio_update
def get_gradio_update(self) -> AdditionalOutputs
Returns debug information and chat history for UI updates.
Returns: AdditionalOutputs containing GradioUpdate with:
chat_history: List of message dictionaries
debug_dict: Debug information dictionary
debug_plot_data: Plot data for visualizations
copy
Creates a new instance of UnmuteHandler.
Returns: UnmuteHandler
Internal Methods
determine_pause
def determine_pause(self) -> bool
Determines if the user has paused speaking based on STT pause prediction.
Returns: bool - True if pause detected
detect_long_silence
async def detect_long_silence()
Handles situations where the user doesn’t respond for more than 7 seconds by inserting a silence marker.
check_for_bot_goodbye
async def check_for_bot_goodbye()
Checks if the assistant’s last message ends with “bye!” and closes the stream if so.
Configuration Constants
USER_SILENCE_TIMEOUT = 7.0 # seconds
FIRST_MESSAGE_TEMPERATURE = 0.7
FURTHER_MESSAGES_TEMPERATURE = 0.3
UNINTERRUPTIBLE_BY_VAD_TIME_SEC = 3 # seconds
Example Usage
import asyncio
from unmute.unmute_handler import UnmuteHandler
async def main():
handler = UnmuteHandler()
async with handler:
await handler.start_up()
# Handler is now ready to process audio frames
# Audio frames would be received via the receive() method
# and outputs retrieved via emit()
await handler.cleanup()
asyncio.run(main())
Notes
- The handler uses a quest manager system to coordinate async tasks for STT, TTS, and LLM
- Conversation state is managed through the internal
Chatbot instance
- Audio is processed at 24kHz sample rate with mono channel
- The handler supports echo cancellation and VAD-based interruption detection