Skip to main content
The Add Voice Transcription skill enables NanoClaw to automatically transcribe WhatsApp voice notes using OpenAI’s Whisper API. Voice messages are downloaded, transcribed, and delivered to the agent as text.

What It Does

The Add Voice Transcription skill:
  • Detects WhatsApp voice messages
  • Downloads audio files automatically
  • Transcribes using OpenAI Whisper API
  • Delivers transcript to agent as [Voice: <transcript>]
  • Falls back gracefully if API key missing or transcription fails

Prerequisites

  • NanoClaw with WhatsApp channel installed
  • OpenAI API key with Whisper access
  • Funded OpenAI account (Whisper requires credits)

How to Apply

1

Invoke the skill

Run /add-voice-transcription in your NanoClaw context.
2

Get OpenAI API key

If you don’t have one:
  1. Go to https://platform.openai.com/api-keys
  2. Click “Create new secret key”
  3. Name it (e.g., “NanoClaw Transcription”)
  4. Copy the key (starts with sk-)
Cost: ~0.006perminuteofaudio( 0.006 per minute of audio (~0.003 per 30-second voice note)
3

Apply code changes

The skill runs npx tsx scripts/apply-skill.ts .claude/skills/add-voice-transcription which:
  • Adds src/transcription.ts module
  • Merges voice handling into WhatsApp channel
  • Adds transcription tests
  • Installs openai dependency
4

Configure environment

The skill adds OPENAI_API_KEY to .env and syncs to container:
mkdir -p data/env && cp .env data/env/env
5

Build and restart

npm run build
launchctl kickstart -k gui/$(id -u)/com.nanoclaw

What Changes

Files Created

  • src/transcription.ts - Voice transcription module using OpenAI Whisper

Files Modified

  • src/channels/whatsapp.ts - Adds voice message detection and transcription
  • src/channels/whatsapp.test.ts - Adds 3 transcription test cases
  • package.json - Adds openai dependency
  • .env - Adds OPENAI_API_KEY
  • .env.example - Documents OPENAI_API_KEY
  • data/env/env - Synced environment for container
  • .nanoclaw/state.yaml - Records skill application

Dependencies Added

  • openai - OpenAI API client for Whisper transcription

Usage

Send Voice Note

Simply send a voice message in any registered WhatsApp chat:
You: [sends voice note: "Hey can you remind me to call John tomorrow?"]
Andy: [Voice: Hey can you remind me to call John tomorrow?]
Andy: I'll remind you to call John tomorrow. What time would you like me to remind you?

You: [sends voice note in different language]
Andy: [Voice: <transcribed in original language>]
Andy: [responds to the content]

Transcription Format

The agent receives:
[Voice: <transcript>]
The agent can respond naturally to the transcribed content.
Whisper supports 50+ languages and automatically detects the language. No configuration needed for multilingual transcription.

Troubleshooting

Voice Notes Show “[Voice Message - transcription unavailable]”

  1. Check OPENAI_API_KEY is set in .env AND synced to data/env/env
  2. Verify key works:
    curl -s https://api.openai.com/v1/models \
      -H "Authorization: Bearer $OPENAI_API_KEY" | head -c 200
    
  3. Check OpenAI billing - Whisper requires funded account

Voice Notes Show “[Voice Message - transcription failed]”

Check logs for specific error:
tail -f logs/nanoclaw.log | grep -i voice
Common causes:
  • Network timeout (transient, will work on next message)
  • Invalid API key (regenerate at platform.openai.com/api-keys)
  • Rate limiting (wait and retry)
  • Insufficient credits (add funds to OpenAI account)

Agent Doesn’t Respond to Voice Notes

Verify:
  1. Chat is registered in database
  2. Agent is running
  3. WhatsApp channel is connected
  4. Transcription succeeded (check logs for “Transcribed voice message”)

High Costs

Whisper pricing is $0.006 per minute. To reduce costs:
  • Use only for registered chats (automatically limited)
  • Consider local Whisper via /use-local-whisper skill (free but slower)
  • Monitor usage at platform.openai.com/usage

Build docs developers (and LLMs) love