Skip to main content

Overview

The CapabilityWorker is the primary SDK interface for building OpenHome Abilities. It provides all I/O operations including text-to-speech, user input, LLM calls, audio playback, file storage, and flow control.
Access CapabilityWorker via self.capability_worker after initializing it in your Ability’s call() method.

Initialization

Initialize CapabilityWorker in your Ability’s call() method:
from src.agent.capability import MatchingCapability
from src.main import AgentWorker
from src.agent.capability_worker import CapabilityWorker

class MyAbility(MatchingCapability):
    worker: AgentWorker = None
    capability_worker: CapabilityWorker = None

    def call(self, worker: AgentWorker):
        self.worker = worker
        self.capability_worker = CapabilityWorker(self)
        self.worker.session_tasks.create(self.run())

    async def run(self):
        await self.capability_worker.speak("Hello!")
        self.capability_worker.resume_normal_flow()

Architecture

CapabilityWorker acts as the bridge between your Ability code and the OpenHome Agent runtime. It handles:
  • Communication: WebSocket connections to the frontend
  • Audio: TTS generation, audio playback, and streaming
  • User Input: Speech-to-text transcription
  • LLM: Text generation with conversation history
  • Storage: Server-side file persistence
  • Control Flow: Signaling when your Ability is done

Quick Reference

Speaking & Text-to-Speech

MethodDescriptionAsync
speak(text)Convert text to speech using Agent’s voice
text_to_speech(text, voice_id)Convert text to speech with custom voice

Listening & User Input

MethodDescriptionAsync
user_response()Wait for user’s next input
wait_for_complete_transcription()Wait for complete utterance
run_io_loop(text)Speak + listen combined
run_confirmation_loop(text)Yes/no confirmation loop

LLM & Text Generation

MethodDescriptionAsync
text_to_text_response(prompt, history, system_prompt)Generate text with LLM
text_to_text_response() is the only synchronous method in CapabilityWorker. Do NOT use await.

Audio Playback

MethodDescriptionAsync
play_audio(file_content)Play audio from bytes
play_from_audio_file(file_name)Play local audio file
stream_init()Initialize audio streaming
send_audio_data_in_stream(data, chunk_size)Send audio chunks
stream_end()End audio streaming

File Storage

MethodDescriptionAsync
check_if_file_exists(filename, temp)Check file existence
write_file(filename, content, temp)Write/append to file
read_file(filename, temp)Read file contents
delete_file(filename, temp)Delete file

Flow Control

MethodDescriptionAsync
resume_normal_flow()REQUIRED - Return control to Agent
send_interrupt_signal()Stop output, return to input
exec_local_command(command)Execute command on local device
send_email(...)Send email via SMTP

User Context

MethodDescriptionAsync
get_timezone()Get user’s timezone string
get_full_message_history()Get conversation history

WebSocket

MethodDescriptionAsync
send_data_over_websocket(type, data)Send custom events
send_devkit_action(action)Send hardware actions

Audio Recording

MethodDescriptionAsync
start_audio_recording()Start recording
stop_audio_recording()Stop recording
get_audio_recording()Get WAV data
get_audio_recording_length()Get recording duration

Complete Example

main.py
import json
from src.agent.capability import MatchingCapability
from src.main import AgentWorker
from src.agent.capability_worker import CapabilityWorker

class WeatherAbility(MatchingCapability):
    worker: AgentWorker = None
    capability_worker: CapabilityWorker = None

    def call(self, worker: AgentWorker):
        self.worker = worker
        self.capability_worker = CapabilityWorker(self)
        self.worker.session_tasks.create(self.run())

    async def run(self):
        try:
            # Ask for location
            location = await self.capability_worker.run_io_loop(
                "What city would you like weather for?"
            )

            # Log the request
            self.worker.editor_logging_handler.info(f"Weather requested for: {location}")

            # Get weather (using LLM for demo - use real API in production)
            prompt = f"What's the weather like in {location}? Respond in one sentence."
            response = self.capability_worker.text_to_text_response(prompt)

            # Speak the result
            await self.capability_worker.speak(response)

        except Exception as e:
            self.worker.editor_logging_handler.error(f"Error: {e}")
            await self.capability_worker.speak("Sorry, something went wrong.")

        # ALWAYS resume normal flow
        self.capability_worker.resume_normal_flow()

Next Steps

Speaking

Text-to-speech with default or custom voices

Listening

User input and combined I/O loops

LLM

Text generation with conversation history

Flow Control

Critical control flow methods

Build docs developers (and LLMs) love