Skip to main content

Overview

Abilities are modular voice AI plugins that extend what OpenHome Agents can do. They’re triggered by spoken phrases and can do anything — call APIs, play audio, run quizzes, control devices, have multi-turn conversations, and more. Each Ability is just one file: main.py — your Python logic. Write your code, zip it, upload it to OpenHome, set your trigger words in the dashboard, and your Agent can do something new.
Abilities are the building blocks of OpenHome’s extensibility. They transform a voice assistant into a programmable AI platform.

How Abilities Work

When a user says a trigger phrase or the Agent’s brain routes to your Ability, your code takes over the conversation:
  1. Trigger — User says a hotword or the routing LLM invokes your Ability
  2. Run — Your main.py executes, taking control of the conversation
  3. Interact — Speak to the user, listen for responses, call APIs, play audio
  4. Exit — Call resume_normal_flow() to return control to the Agent

The Basic Structure

Here’s the minimal scaffolding for a working Ability:
main.py
import json
from src.agent.capability import MatchingCapability
from src.main import AgentWorker
from src.agent.capability_worker import CapabilityWorker

class MyFirstCapability(MatchingCapability):
    worker: AgentWorker = None
    capability_worker: CapabilityWorker = None

    # Do not change following tag of register capability
    #{{register capability}}

    async def run(self):
        # Greet the user
        await self.capability_worker.speak("Hi! Tell me what's on your mind.")
        
        # Listen for input
        user_input = await self.capability_worker.user_response()
        
        # Process with LLM
        response = self.capability_worker.text_to_text_response(
            f"Give a short, helpful response to: {user_input}"
        )
        
        # Speak the result
        await self.capability_worker.speak(response)
        
        # CRITICAL: Always call this when done
        self.capability_worker.resume_normal_flow()

    def call(self, worker: AgentWorker):
        self.worker = worker
        self.capability_worker = CapabilityWorker(self)
        self.worker.session_tasks.create(self.run())
The #{{register capability}} comment is required boilerplate. The platform handles configuration automatically — you never need to create or edit config.json.

Key Components

CapabilityWorker

The SDK that provides all I/O functionality:
  • Speaking: speak(), text_to_speech()
  • Listening: user_response(), wait_for_complete_transcription()
  • LLM: text_to_text_response()
  • Audio: play_audio(), play_from_audio_file()
  • Files: read_file(), write_file(), delete_file()
  • Flow Control: resume_normal_flow(), send_interrupt_signal()

AgentWorker

The Agent’s runtime environment:
  • Logging: editor_logging_handler.info(), .error(), .warning()
  • Session Management: session_tasks.create(), session_tasks.sleep()
  • User Context: user_socket.client.host, timezone info
  • Music Mode: music_mode_event.set(), music_mode_event.clear()

Key Capabilities

Voice Interaction

Speak to users and listen for responses with built-in TTS and STT

API Integration

Call external REST APIs and speak the results naturally

Audio Playback

Stream music, play sound effects, or read custom audio files

LLM Processing

Use language models for natural conversation and intent parsing

Persistent Storage

Save user preferences and data across sessions

Local Execution

Execute commands on connected devices via WebSocket

The Lifecycle

Every Ability follows this pattern:
async def run(self):
    try:
        # 1. Initialize: greet user, explain what you'll do
        await self.capability_worker.speak("Let me help with that.")
        
        # 2. Gather input: listen, confirm, validate
        user_input = await self.capability_worker.user_response()
        
        # 3. Process: call APIs, use LLM, compute results
        result = self.capability_worker.text_to_text_response(user_input)
        
        # 4. Deliver output: speak results, play audio
        await self.capability_worker.speak(result)
        
    except Exception as e:
        # 5. Handle errors: always speak them to the user
        self.worker.editor_logging_handler.error(f"Error: {e}")
        await self.capability_worker.speak("Something went wrong.")
        
    finally:
        # 6. Exit: ALWAYS return control to the Agent
        self.capability_worker.resume_normal_flow()
Critical Rule: Every Ability MUST call resume_normal_flow() when done. Without it, the Agent goes silent and the user has to restart the conversation.

What Makes a Great Ability?

Voice-First Design

  • Keep responses to 1-2 sentences
  • Confirm before destructive actions
  • Speak all errors to the user
  • Use natural, conversational language

Robust Error Handling

  • Always wrap API calls in try/except
  • Log errors for debugging
  • Provide helpful fallback messages
  • Always exit gracefully with resume_normal_flow()

Clear Purpose

  • Do one thing well
  • Have clear trigger words
  • Set user expectations upfront
  • Exit cleanly when done

Next Steps

Ability Types

Learn about Skills, Background Daemons, and Local abilities

Trigger Words

Understand how trigger words activate your abilities

Getting Started

Build your first ability in 5 minutes

SDK Reference

Explore all available SDK methods

Build docs developers (and LLMs) love