recognize_ibm()

Overview

Performs speech recognition using IBM Watson Speech to Text API. Provides enterprise-grade speech recognition with support for multiple languages and custom models.

Method Signature

recognize_ibm(
    audio_data: AudioData,
    key: str,
    language: str = "en-US",
    show_all: bool = False
) -> str | tuple[str, float] | dict

Parameters

audio_data

AudioData

required

The audio data to recognize. Must be an AudioData instance.

key

str

required

IBM Watson Speech to Text API key.See setup instructions below for how to obtain an API key.

language

str

default:"en-US"

Recognition language as an RFC5646 language tag with dialect (e.g., "en-US", "es-ES", "zh-CN").The supported language values are listed in the API documentation as model names like en-US_BroadbandModel.

show_all

bool

default:"False"

If True, returns the raw API response as a JSON dictionary. If False, returns a tuple of (transcript, confidence).

Returns

result

tuple[str, float]

When show_all=False, returns (transcript, confidence) where:

transcript: The recognized text (may contain multiple utterances separated by newlines)
confidence: Confidence score between 0 and 1

response

dict

When show_all=True, returns the raw API response containing:

results: List of recognition results
alternatives: Multiple transcription alternatives with confidence scores

Exceptions

UnknownValueError

Exception

Raised when the speech is unintelligible

RequestError

Exception

Raised when:

The API request fails
The API key is invalid
There is no internet connection

Example Usage

Basic Recognition

import speech_recognition as sr

# Initialize recognizer
r = sr.Recognizer()

# Your IBM Watson API key
IBM_KEY = "your-ibm-api-key"

# Record audio
with sr.Microphone() as source:
    print("Say something!")
    audio = r.listen(source)

# Recognize with IBM Watson
try:
    text, confidence = r.recognize_ibm(audio, key=IBM_KEY)
    print(f"You said: {text}")
    print(f"Confidence: {confidence:.2%}")
except sr.UnknownValueError:
    print("Could not understand audio")
except sr.RequestError as e:
    print(f"API error: {e}")

With Different Languages

import speech_recognition as sr

IBM_KEY = "your-api-key"

r = sr.Recognizer()

# Spanish recognition
with sr.Microphone() as source:
    print("Diga algo...")
    audio = r.listen(source)

try:
    text, confidence = r.recognize_ibm(
        audio,
        key=IBM_KEY,
        language="es-ES"
    )
    print(f"Usted dijo: {text}")
    print(f"Confianza: {confidence:.2%}")
except sr.UnknownValueError:
    print("No se pudo entender el audio")

Getting Full API Response

import speech_recognition as sr
import json

IBM_KEY = "your-api-key"

r = sr.Recognizer()

with sr.Microphone() as source:
    audio = r.listen(source)

try:
    # Get complete response
    response = r.recognize_ibm(
        audio,
        key=IBM_KEY,
        show_all=True
    )
    
    print(json.dumps(response, indent=2))
    
    # Access multiple alternatives
    for result in response.get('results', []):
        for alternative in result.get('alternatives', []):
            print(f"Transcript: {alternative['transcript']}")
            if 'confidence' in alternative:
                print(f"Confidence: {alternative['confidence']:.2%}")
except sr.UnknownValueError:
    print("Could not understand audio")

From Audio File

import speech_recognition as sr

IBM_KEY = "your-api-key"

r = sr.Recognizer()

# Load audio file
with sr.AudioFile("speech.wav") as source:
    audio = r.record(source)

try:
    text, confidence = r.recognize_ibm(audio, key=IBM_KEY)
    print(f"Transcript: {text}")
    print(f"Confidence: {confidence:.2%}")
except sr.RequestError as e:
    print(f"Error: {e}")

Using Environment Variables

import speech_recognition as sr
import os

# Store API key in environment variable
IBM_KEY = os.getenv("IBM_WATSON_API_KEY")

r = sr.Recognizer()

with sr.Microphone() as source:
    audio = r.listen(source)

try:
    text, confidence = r.recognize_ibm(audio, key=IBM_KEY)
    print(f"Transcript: {text}")
except sr.RequestError as e:
    print(f"Error: {e}")

Mandarin Chinese Recognition

import speech_recognition as sr

IBM_KEY = "your-api-key"

r = sr.Recognizer()

with sr.Microphone() as source:
    print("请说话...")
    audio = r.listen(source)

try:
    text, confidence = r.recognize_ibm(
        audio,
        key=IBM_KEY,
        language="zh-CN"
    )
    print(f"你说: {text}")
except sr.UnknownValueError:
    print("无法理解音频")

Setup Instructions

1. Create IBM Cloud Account

Go to IBM Cloud
Sign up for a free account (Lite tier available)
Log in to the IBM Cloud Console

2. Create Speech to Text Service

Go to IBM Cloud Catalog
Search for “Speech to Text”
Click on the service
Select a region (e.g., Dallas, Washington DC)
Choose the Lite plan (free tier) or a paid plan
Give your service a name
Click Create

3. Get API Key

After creation, you’ll be taken to the service dashboard
Click Manage in the left sidebar
Under Credentials, you’ll see:
- API Key: Your authentication key
- URL: Service endpoint URL
Copy the API Key

4. Use in Code

import speech_recognition as sr

IBM_KEY = "your-api-key-here"

r = sr.Recognizer()

with sr.Microphone() as source:
    audio = r.listen(source)

text, confidence = r.recognize_ibm(audio, key=IBM_KEY)
print(text)

Language Support

IBM Watson Speech to Text supports many languages:

Available Models

en-US - English (United States)
en-GB - English (United Kingdom)
es-ES - Spanish (Spain)
es-LA - Spanish (Latin America)
fr-FR - French (France)
de-DE - German (Germany)
it-IT - Italian (Italy)
ja-JP - Japanese
ko-KR - Korean
pt-BR - Portuguese (Brazil)
zh-CN - Chinese (Mandarin, Simplified)
ar-MS - Arabic (Modern Standard)
nl-NL - Dutch (Netherlands)
fr-CA - French (Canada)

See the full language list in IBM’s documentation.

Pricing

Lite Plan: 500 minutes per month (free)
Standard Plan: Pay-per-use after Lite tier

Check IBM Watson Speech to Text pricing for current rates.

Features

Multiple Languages: Support for 10+ languages
Custom Models: Train custom acoustic and language models
Speaker Labels: Identify different speakers
Smart Formatting: Automatic formatting of dates, times, numbers
Profanity Filtering: Optional filtering of profane words
Word Timestamps: Get timing for each word
Confidence Scores: Returns confidence for transcriptions

Notes

Requires internet connection
Audio must be at least 16 kHz sample rate
Audio is automatically converted to 16-bit samples
Returns both transcript and confidence score
Free Lite tier includes 500 minutes per month
Multiple utterances are separated by newlines in the transcript
Uses FLAC audio format for transmission

Core Classes

Recognition Methods

Exceptions

Overview

Method Signature

Parameters

Returns

Exceptions

Example Usage

Basic Recognition

With Different Languages

Getting Full API Response

From Audio File

Using Environment Variables

Mandarin Chinese Recognition

Setup Instructions

1. Create IBM Cloud Account

2. Create Speech to Text Service

3. Get API Key

4. Use in Code

Language Support

Available Models

Pricing

Features

Notes

Core Classes

Recognition Methods

Exceptions

​Overview

​Method Signature

​Parameters

​Returns

​Exceptions

​Example Usage

​Basic Recognition

​With Different Languages

​Getting Full API Response

​From Audio File

​Using Environment Variables

​Mandarin Chinese Recognition

​Setup Instructions

​1. Create IBM Cloud Account

​2. Create Speech to Text Service

​3. Get API Key

​4. Use in Code

​Language Support

​Available Models

​Pricing

​Features

​Notes

Overview

Method Signature

Parameters

Returns

Exceptions

Example Usage

Basic Recognition

With Different Languages

Getting Full API Response

From Audio File

Using Environment Variables

Mandarin Chinese Recognition

Setup Instructions

1. Create IBM Cloud Account

2. Create Speech to Text Service

3. Get API Key

4. Use in Code

Language Support

Available Models

Pricing

Features

Notes