Overview
Abilities are modular voice AI plugins that extend what OpenHome Agents can do. They’re triggered by spoken phrases and can do anything — call APIs, play audio, run quizzes, control devices, have multi-turn conversations, and more. Each Ability is just one file:main.py — your Python logic. Write your code, zip it, upload it to OpenHome, set your trigger words in the dashboard, and your Agent can do something new.
Abilities are the building blocks of OpenHome’s extensibility. They transform a voice assistant into a programmable AI platform.
How Abilities Work
When a user says a trigger phrase or the Agent’s brain routes to your Ability, your code takes over the conversation:- Trigger — User says a hotword or the routing LLM invokes your Ability
- Run — Your
main.pyexecutes, taking control of the conversation - Interact — Speak to the user, listen for responses, call APIs, play audio
- Exit — Call
resume_normal_flow()to return control to the Agent
The Basic Structure
Here’s the minimal scaffolding for a working Ability:main.py
The
#{{register capability}} comment is required boilerplate. The platform handles configuration automatically — you never need to create or edit config.json.Key Components
CapabilityWorker
The SDK that provides all I/O functionality:- Speaking:
speak(),text_to_speech() - Listening:
user_response(),wait_for_complete_transcription() - LLM:
text_to_text_response() - Audio:
play_audio(),play_from_audio_file() - Files:
read_file(),write_file(),delete_file() - Flow Control:
resume_normal_flow(),send_interrupt_signal()
AgentWorker
The Agent’s runtime environment:- Logging:
editor_logging_handler.info(),.error(),.warning() - Session Management:
session_tasks.create(),session_tasks.sleep() - User Context:
user_socket.client.host, timezone info - Music Mode:
music_mode_event.set(),music_mode_event.clear()
Key Capabilities
Voice Interaction
Speak to users and listen for responses with built-in TTS and STT
API Integration
Call external REST APIs and speak the results naturally
Audio Playback
Stream music, play sound effects, or read custom audio files
LLM Processing
Use language models for natural conversation and intent parsing
Persistent Storage
Save user preferences and data across sessions
Local Execution
Execute commands on connected devices via WebSocket
The Lifecycle
Every Ability follows this pattern:Critical Rule: Every Ability MUST call
resume_normal_flow() when done. Without it, the Agent goes silent and the user has to restart the conversation.What Makes a Great Ability?
Voice-First Design
- Keep responses to 1-2 sentences
- Confirm before destructive actions
- Speak all errors to the user
- Use natural, conversational language
Robust Error Handling
- Always wrap API calls in try/except
- Log errors for debugging
- Provide helpful fallback messages
- Always exit gracefully with
resume_normal_flow()
Clear Purpose
- Do one thing well
- Have clear trigger words
- Set user expectations upfront
- Exit cleanly when done
Next Steps
Ability Types
Learn about Skills, Background Daemons, and Local abilities
Trigger Words
Understand how trigger words activate your abilities
Getting Started
Build your first ability in 5 minutes
SDK Reference
Explore all available SDK methods
