Wit.ai Speech Recognition

Wit.ai is a free speech recognition service owned by Meta (Facebook) that combines speech-to-text with natural language understanding (NLU). It’s ideal for building voice assistants, chatbots, and conversational interfaces.

Method Signature

recognize_wit(
    audio_data: AudioData,
    key: str,
    show_all: bool = False
) -> str | dict

Parameters

audio_data

AudioData

required

An AudioData instance containing the audio to transcribe.

key

str

required

Wit.ai API key (Client Access Token). Required for authentication.See Getting an API Key for instructions.

show_all

bool

default:"False"

If True, returns the full API response including intents and entities. If False, returns only the transcribed text.

Returns

Default: str - The transcribed text
With show_all=True: dict - Full API response with transcription, intents, entities, and confidence

Getting an API Key

Create Wit.ai Account

Create an App

Click “New App”
Enter an app name
Choose a language
Click “Create”

Add an Intent

Before you can see your API key, you must add at least one intent:

Go to “Understanding” tab
Click “Create Intent”
Enter any intent name (e.g., “transcribe”)
The actual intent settings don’t matter for basic speech recognition

Get API Key

Go to “Settings” (gear icon)
Find the section “Make an API request”
Look for: Authorization: Bearer XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
Copy the 32-character uppercase alphanumeric string

Wit.ai API keys are 32-character uppercase alphanumeric strings.

Basic Example

import speech_recognition as sr

WIT_AI_KEY = "YOUR_WIT_AI_API_KEY"

r = sr.Recognizer()

with sr.AudioFile("audio.wav") as source:
    audio = r.record(source)

try:
    text = r.recognize_wit(audio, key=WIT_AI_KEY)
    print(f"Wit.ai thinks you said: {text}")
except sr.UnknownValueError:
    print("Wit.ai could not understand audio")
except sr.RequestError as e:
    print(f"Could not request results from Wit.ai; {e}")

Microphone Example

import speech_recognition as sr

WIT_AI_KEY = "YOUR_WIT_AI_API_KEY"

r = sr.Recognizer()

with sr.Microphone() as source:
    print("Say something!")
    audio = r.listen(source)

print("Recognizing...")
try:
    text = r.recognize_wit(audio, key=WIT_AI_KEY)
    print(f"You said: {text}")
except sr.UnknownValueError:
    print("Could not understand audio")
except sr.RequestError as e:
    print(f"Error: {e}")

Language Support

The recognition language is configured in your Wit.ai app settings (not in the API call). Supported languages (120+):

Arabic
Bengali
Chinese (Simplified & Traditional)
Czech
Danish
Dutch
English
Finnish
French
German
Greek
Hebrew
Hindi
Hungarian
Indonesian
Italian
Japanese
Korean
Norwegian
Polish
Portuguese
Romanian
Russian
Spanish
Swedish
Thai
Turkish
Ukrainian
Vietnamese
And many more…

Changing Language

To change the language:

Go to your Wit.ai app
Click Settings (gear icon)
Under “Language”, select your desired language
Click “Save”

Language is set per app. You’ll need different apps for different languages.

Full Response with Intents

Wit.ai is designed for natural language understanding, not just transcription:

import speech_recognition as sr
import json

WIT_AI_KEY = "YOUR_WIT_AI_API_KEY"

r = sr.Recognizer()

with sr.AudioFile("audio.wav") as source:
    audio = r.record(source)

# Get full response
response = r.recognize_wit(audio, key=WIT_AI_KEY, show_all=True)

print(json.dumps(response, indent=2))

# Access specific fields
if "_text" in response:
    print(f"Transcription: {response['_text']}")

if "intents" in response:
    for intent in response["intents"]:
        print(f"Intent: {intent['name']} (confidence: {intent['confidence']})")

if "entities" in response:
    print(f"Entities: {response['entities']}")

Using Environment Variables

import speech_recognition as sr
import os

WIT_AI_KEY = os.environ.get("WIT_AI_KEY")

if not WIT_AI_KEY:
    raise ValueError("WIT_AI_KEY environment variable not set")

r = sr.Recognizer()

with sr.AudioFile("audio.wav") as source:
    audio = r.record(source)

text = r.recognize_wit(audio, key=WIT_AI_KEY)
print(text)

Error Handling

import speech_recognition as sr

WIT_AI_KEY = "YOUR_WIT_AI_API_KEY"

r = sr.Recognizer()

with sr.AudioFile("audio.wav") as source:
    audio = r.record(source)

try:
    text = r.recognize_wit(audio, key=WIT_AI_KEY)
    print(f"Transcription: {text}")
    
except sr.UnknownValueError:
    # Speech was unintelligible
    print("Could not understand the audio")
    
except sr.RequestError as e:
    # API request failed
    error_msg = str(e).lower()
    if "invalid" in error_msg or "key" in error_msg:
        print("Invalid API key")
    elif "connection" in error_msg:
        print("Network connection error")
    elif "rate" in error_msg or "limit" in error_msg:
        print("Rate limit exceeded")
    else:
        print(f"API error: {e}")

Audio Requirements

Sample Rate: Minimum 8 kHz (automatically converted if lower)
Sample Width: 16-bit (automatically converted)
Format: Converted to WAV before sending to API
Channels: Mono (stereo is automatically converted)

Timeouts

import speech_recognition as sr

WIT_AI_KEY = "YOUR_WIT_AI_API_KEY"

r = sr.Recognizer()
r.operation_timeout = 10  # Wait up to 10 seconds

with sr.AudioFile("audio.wav") as source:
    audio = r.record(source)

try:
    text = r.recognize_wit(audio, key=WIT_AI_KEY)
    print(text)
except sr.WaitTimeoutError:
    print("Request timed out")

Multiple Languages

For multi-language support, create separate Wit.ai apps and use different API keys:

import speech_recognition as sr

# Different apps for different languages
WIT_EN_KEY = "english_app_key"
WIT_ES_KEY = "spanish_app_key"
WIT_FR_KEY = "french_app_key"

r = sr.Recognizer()

with sr.AudioFile("audio.wav") as source:
    audio = r.record(source)

# Detect language first (or let user choose)
language = "english"  # This would come from language detection

if language == "english":
    text = r.recognize_wit(audio, key=WIT_EN_KEY)
elif language == "spanish":
    text = r.recognize_wit(audio, key=WIT_ES_KEY)
elif language == "french":
    text = r.recognize_wit(audio, key=WIT_FR_KEY)

print(text)

Voice Assistant Example

Combine speech recognition with intent understanding:

import speech_recognition as sr

WIT_AI_KEY = "YOUR_WIT_AI_API_KEY"

r = sr.Recognizer()

with sr.Microphone() as source:
    print("Listening...")
    audio = r.listen(source)

try:
    # Get full response with intents
    response = r.recognize_wit(audio, key=WIT_AI_KEY, show_all=True)
    
    text = response.get("_text", "")
    print(f"You said: {text}")
    
    # Process intents
    intents = response.get("intents", [])
    if intents:
        primary_intent = intents[0]
        intent_name = primary_intent.get("name")
        confidence = primary_intent.get("confidence", 0)
        
        if confidence > 0.7:
            if intent_name == "turn_on_light":
                print("Turning on the light...")
            elif intent_name == "play_music":
                print("Playing music...")
            elif intent_name == "weather":
                print("Checking weather...")
        else:
            print("Intent confidence too low")
    else:
        print("No intent detected")
        
except sr.UnknownValueError:
    print("Could not understand audio")
except sr.RequestError as e:
    print(f"Error: {e}")

Rate Limits

Wit.ai has rate limits on API requests:

Free tier: Generous limits for most applications
No hard cap: Rates are monitored but rarely enforced for normal use

If you hit rate limits, implement exponential backoff:

import speech_recognition as sr
import time

WIT_AI_KEY = "YOUR_WIT_AI_API_KEY"

def recognize_with_retry(audio, max_retries=3):
    r = sr.Recognizer()
    
    for attempt in range(max_retries):
        try:
            return r.recognize_wit(audio, key=WIT_AI_KEY)
        except sr.RequestError as e:
            if "rate" in str(e).lower() and attempt < max_retries - 1:
                wait_time = 2 ** attempt  # Exponential backoff
                print(f"Rate limited. Waiting {wait_time} seconds...")
                time.sleep(wait_time)
            else:
                raise
    
    raise sr.RequestError("Max retries exceeded")

# Use it
with sr.AudioFile("audio.wav") as source:
    audio = r.record(source)

text = recognize_with_retry(audio)
print(text)

Best Practices

For production applications:

Store API keys in environment variables
Implement proper error handling and retries
Use Wit.ai’s NLU features (intents, entities) for richer interactions
Create separate apps for different languages
Monitor API usage in the Wit.ai dashboard
Implement caching for repeated queries

Privacy Considerations: Audio is sent to Wit.ai (Meta) servers. Ensure compliance with:

Your privacy policy
GDPR (European users)
CCPA (California users)
Other local regulations

Advantages

Free: No usage limits or costs
NLU Built-in: Intents and entities for conversational AI
Many Languages: 120+ supported languages
Easy Setup: Simple API, quick integration
Facebook Integration: Works well with Messenger bots

Limitations

Privacy: Audio sent to Meta servers
Accuracy: Good but not as high as Google or Azure
No Streaming: Only supports batch processing
Language per App: Need separate apps for multiple languages
Limited Customization: Can’t train custom acoustic models

Use Cases

Voice assistants
Chatbots with voice input
Smart home controls
Facebook Messenger bots
Voice commands in mobile apps
Simple transcription tasks

Getting Started

Core Concepts

Recognition Engines

Guides

Examples

Method Signature

Parameters

Returns

Getting an API Key

Basic Example

Microphone Example

Language Support

Changing Language

Full Response with Intents

Using Environment Variables

Error Handling

Audio Requirements

Timeouts

Multiple Languages

Voice Assistant Example

Rate Limits

Best Practices

Advantages

Limitations

Use Cases

Getting Started

Core Concepts

Recognition Engines

Guides

Examples

​Method Signature

​Parameters

​Returns

​Getting an API Key

​Basic Example

​Microphone Example

​Language Support

​Changing Language

​Full Response with Intents

​Using Environment Variables

​Error Handling

​Audio Requirements

​Timeouts

​Multiple Languages

​Voice Assistant Example

​Rate Limits

​Best Practices

​Advantages

​Limitations

​Use Cases

​Related Resources

Method Signature

Parameters

Returns

Getting an API Key

Basic Example

Microphone Example

Language Support

Changing Language

Full Response with Intents

Using Environment Variables

Error Handling

Audio Requirements

Timeouts

Multiple Languages

Voice Assistant Example

Rate Limits

Best Practices

Advantages

Limitations

Use Cases

Related Resources