Add Voice Transcription

The Add Voice Transcription skill enables NanoClaw to automatically transcribe WhatsApp voice notes using OpenAI’s Whisper API. Voice messages are downloaded, transcribed, and delivered to the agent as text.

What It Does

The Add Voice Transcription skill:

Detects WhatsApp voice messages
Downloads audio files automatically
Transcribes using OpenAI Whisper API
Delivers transcript to agent as [Voice: <transcript>]
Falls back gracefully if API key missing or transcription fails

Prerequisites

NanoClaw with WhatsApp channel installed
OpenAI API key with Whisper access
Funded OpenAI account (Whisper requires credits)

How to Apply

Invoke the skill

Run /add-voice-transcription in your NanoClaw context.

Get OpenAI API key

If you don’t have one:

Go to https://platform.openai.com/api-keys
Click “Create new secret key”
Name it (e.g., “NanoClaw Transcription”)
Copy the key (starts with sk-)

Cost: ~

0.006 per minute of audio (~

0.003 per 30-second voice note)

Apply code changes

The skill runs npx tsx scripts/apply-skill.ts .claude/skills/add-voice-transcription which:

Adds src/transcription.ts module
Merges voice handling into WhatsApp channel
Adds transcription tests
Installs openai dependency

Configure environment

The skill adds OPENAI_API_KEY to .env and syncs to container:

mkdir -p data/env && cp .env data/env/env

Build and restart

npm run build
launchctl kickstart -k gui/$(id -u)/com.nanoclaw

What Changes

Files Created

src/transcription.ts - Voice transcription module using OpenAI Whisper

Files Modified

src/channels/whatsapp.ts - Adds voice message detection and transcription
src/channels/whatsapp.test.ts - Adds 3 transcription test cases
package.json - Adds openai dependency
.env - Adds OPENAI_API_KEY
.env.example - Documents OPENAI_API_KEY
data/env/env - Synced environment for container
.nanoclaw/state.yaml - Records skill application

Dependencies Added

openai - OpenAI API client for Whisper transcription

Usage

Send Voice Note

Simply send a voice message in any registered WhatsApp chat:

You: [sends voice note: "Hey can you remind me to call John tomorrow?"]
Andy: [Voice: Hey can you remind me to call John tomorrow?]
Andy: I'll remind you to call John tomorrow. What time would you like me to remind you?

You: [sends voice note in different language]
Andy: [Voice: <transcribed in original language>]
Andy: [responds to the content]

Transcription Format

The agent receives:

[Voice: <transcript>]

The agent can respond naturally to the transcribed content.

Whisper supports 50+ languages and automatically detects the language. No configuration needed for multilingual transcription.

Troubleshooting

Voice Notes Show “[Voice Message - transcription unavailable]”

Check OPENAI_API_KEY is set in .env AND synced to data/env/env

Verify key works:

curl -s https://api.openai.com/v1/models \
  -H "Authorization: Bearer $OPENAI_API_KEY" | head -c 200

Check OpenAI billing - Whisper requires funded account

Voice Notes Show “[Voice Message - transcription failed]”

Check logs for specific error:

tail -f logs/nanoclaw.log | grep -i voice

Common causes:

Network timeout (transient, will work on next message)
Invalid API key (regenerate at platform.openai.com/api-keys)
Rate limiting (wait and retry)
Insufficient credits (add funds to OpenAI account)

Agent Doesn’t Respond to Voice Notes

Verify:

Chat is registered in database
Agent is running
WhatsApp channel is connected
Transcription succeeded (check logs for “Transcribed voice message”)

High Costs

Whisper pricing is $0.006 per minute. To reduce costs:

Use only for registered chats (automatically limited)
Consider local Whisper via /use-local-whisper skill (free but slower)
Monitor usage at platform.openai.com/usage

Built-in Skills

Channel Skills

Enhancement Skills

Advanced Skills

Add Voice Transcription

What It Does

Prerequisites

How to Apply

What Changes

Files Created

Files Modified

Dependencies Added

Usage

Send Voice Note

Transcription Format

Troubleshooting

Voice Notes Show “[Voice Message - transcription unavailable]”

Voice Notes Show “[Voice Message - transcription failed]”

Agent Doesn’t Respond to Voice Notes

High Costs

Build docs developers (and LLMs) love

Built-in Skills

Channel Skills

Enhancement Skills

Advanced Skills

​What It Does

​Prerequisites

​How to Apply

​What Changes

​Files Created

​Files Modified

​Dependencies Added

​Usage

​Send Voice Note

​Transcription Format

​Troubleshooting

​Voice Notes Show “[Voice Message - transcription unavailable]”

​Voice Notes Show “[Voice Message - transcription failed]”

​Agent Doesn’t Respond to Voice Notes

​High Costs

Build docs developers (and LLMs) love

What It Does

Prerequisites

How to Apply

What Changes

Files Created

Files Modified

Dependencies Added

Usage

Send Voice Note

Transcription Format

Troubleshooting

Voice Notes Show “[Voice Message - transcription unavailable]”

Voice Notes Show “[Voice Message - transcription failed]”

Agent Doesn’t Respond to Voice Notes

High Costs