Google Speech Recognition

Google Speech Recognition is one of the most popular and accessible speech recognition engines. It offers high accuracy, supports over 100 languages, and has a free tier that works without requiring an API key.

Method Signature

recognize_google(
    audio_data: AudioData,
    key: str | None = None,
    language: str = "en-US",
    pfilter: int = 0,
    show_all: bool = False,
    with_confidence: bool = False
) -> str | tuple[str, float] | dict

Parameters

audio_data

AudioData

required

An AudioData instance containing the audio to transcribe.

key

str

default:"None"

Google Speech Recognition API key. If None, uses a generic key that works out of the box.

The default key is for testing purposes and may be revoked by Google at any time. For production use, obtain your own API key.

language

str

default:"en-US"

Recognition language as an RFC5646 language tag (e.g., "en-US", "fr-FR", "es-ES"). See supported languages.

pfilter

int

default:"0"

Profanity filter level:

0: No filter
1: Only shows the first character and replaces the rest with asterisks

show_all

bool

default:"False"

If True, returns the raw API response as a JSON dictionary. If False, returns only the transcription text.

with_confidence

bool

default:"False"

If True, returns a tuple of (transcription, confidence). The confidence value is a float between 0 and 1.

Returns

Default: str - The transcribed text
With with_confidence=True: tuple[str, float] - Transcription and confidence score
With show_all=True: dict - Full API response with all alternatives

Basic Example

import speech_recognition as sr

# Initialize recognizer
r = sr.Recognizer()

# Transcribe from audio file
with sr.AudioFile("speech.wav") as source:
    audio = r.record(source)

try:
    text = r.recognize_google(audio)
    print(f"Google thinks you said: {text}")
except sr.UnknownValueError:
    print("Google could not understand audio")
except sr.RequestError as e:
    print(f"Could not request results; {e}")

Microphone Example

import speech_recognition as sr

r = sr.Recognizer()

# Capture audio from microphone
with sr.Microphone() as source:
    print("Say something!")
    audio = r.listen(source)

# Recognize speech
try:
    text = r.recognize_google(audio)
    print(f"You said: {text}")
except sr.UnknownValueError:
    print("Could not understand audio")
except sr.RequestError as e:
    print(f"Error: {e}")

Using Your Own API Key

To obtain your own API key:

Go to the Google Cloud Console
Create a new project or select an existing one
Enable the “Cloud Speech-to-Text API”
Navigate to APIs & Services > Credentials
Create an API key

import speech_recognition as sr

API_KEY = "YOUR_GOOGLE_API_KEY"

r = sr.Recognizer()

with sr.AudioFile("audio.wav") as source:
    audio = r.record(source)

text = r.recognize_google(audio, key=API_KEY)
print(text)

Google Speech Recognition API is different from Google Cloud Speech-to-Text API. For the Cloud API, use recognize_google_cloud() instead.

Language Support

Google Speech Recognition supports over 100 languages. Here are some examples:

Common Languages
Regional Variants

# English (US)
r.recognize_google(audio, language="en-US")

# Spanish (Spain)
r.recognize_google(audio, language="es-ES")

# French (France)
r.recognize_google(audio, language="fr-FR")

# German
r.recognize_google(audio, language="de-DE")

# Japanese
r.recognize_google(audio, language="ja-JP")

# Chinese (Mandarin, Simplified)
r.recognize_google(audio, language="zh-CN")

# English variants
r.recognize_google(audio, language="en-US")  # United States
r.recognize_google(audio, language="en-GB")  # United Kingdom
r.recognize_google(audio, language="en-AU")  # Australia
r.recognize_google(audio, language="en-IN")  # India

# Spanish variants
r.recognize_google(audio, language="es-ES")  # Spain
r.recognize_google(audio, language="es-MX")  # Mexico
r.recognize_google(audio, language="es-AR")  # Argentina

For a complete list of supported languages, see this StackOverflow answer.

Confidence Scores

import speech_recognition as sr

r = sr.Recognizer()

with sr.AudioFile("audio.wav") as source:
    audio = r.record(source)

# Get transcription with confidence score
transcription, confidence = r.recognize_google(
    audio,
    with_confidence=True
)

print(f"Transcription: {transcription}")
print(f"Confidence: {confidence:.2%}")

Full Response

Get all alternative transcriptions:

import speech_recognition as sr

r = sr.Recognizer()

with sr.AudioFile("audio.wav") as source:
    audio = r.record(source)

# Get full response with all alternatives
response = r.recognize_google(audio, show_all=True)

print("Full response:")
for result in response["alternative"]:
    print(f"  {result['transcript']} (confidence: {result.get('confidence', 'N/A')})")

Profanity Filtering

import speech_recognition as sr

r = sr.Recognizer()

with sr.AudioFile("audio.wav") as source:
    audio = r.record(source)

# Apply profanity filter
text = r.recognize_google(audio, pfilter=1)
print(text)  # Profanity will be masked: "f***"

Error Handling

import speech_recognition as sr

r = sr.Recognizer()

with sr.AudioFile("audio.wav") as source:
    audio = r.record(source)

try:
    text = r.recognize_google(audio)
    print(f"Transcription: {text}")
    
except sr.UnknownValueError:
    # Speech was unintelligible
    print("Could not understand the audio")
    
except sr.RequestError as e:
    # API request failed (network error, invalid key, etc.)
    print(f"API request failed: {e}")

Audio Requirements

Sample Rate: Minimum 8 kHz (automatically converted if lower)
Sample Width: 16-bit (automatically converted)
Format: Converted to FLAC before sending to API
Channels: Mono (stereo is automatically converted)

Timeouts

Control how long to wait for the API response:

import speech_recognition as sr

r = sr.Recognizer()
r.operation_timeout = 10  # Wait up to 10 seconds for response

with sr.AudioFile("audio.wav") as source:
    audio = r.record(source)

try:
    text = r.recognize_google(audio)
    print(text)
except sr.WaitTimeoutError:
    print("Request timed out")

Best Practices

For production applications:

Always use your own API key
Implement proper error handling
Monitor API usage and costs
Consider using confidence scores to filter low-quality transcriptions

Privacy Considerations: Audio data is sent to Google’s servers for processing. Ensure compliance with your privacy policy and local regulations (GDPR, CCPA, etc.).

Comparison with Google Cloud Speech

Feature	recognize_google	recognize_google_cloud
Setup	Simple, minimal	Requires authentication
API Key	Optional	Required
Features	Basic	Advanced (streaming, speaker diarization)
Pricing	Free tier	Pay-as-you-go
Use Case	Simple apps, prototyping	Production, enterprise

For advanced features like streaming recognition, speaker diarization, and custom models, use recognize_google_cloud() instead.

Getting Started

Core Concepts

Recognition Engines

Guides

Examples

Method Signature

Parameters

Returns

Basic Example

Microphone Example

Using Your Own API Key

Language Support

Confidence Scores

Full Response

Profanity Filtering

Error Handling

Audio Requirements

Timeouts

Best Practices

Comparison with Google Cloud Speech

Getting Started

Core Concepts

Recognition Engines

Guides

Examples

​Method Signature

​Parameters

​Returns

​Basic Example

​Microphone Example

​Using Your Own API Key

​Language Support

​Confidence Scores

​Full Response

​Profanity Filtering

​Error Handling

​Audio Requirements

​Timeouts

​Best Practices

​Comparison with Google Cloud Speech

​Related Resources

Method Signature

Parameters

Returns

Basic Example

Microphone Example

Using Your Own API Key

Language Support

Confidence Scores

Full Response

Profanity Filtering

Error Handling

Audio Requirements

Timeouts

Best Practices

Comparison with Google Cloud Speech

Related Resources