recognize_google_cloud()

Overview

Performs speech recognition using the Google Cloud Speech-to-Text V1 API. This is the enterprise-grade version with more features and better accuracy than the basic Google Speech Recognition API.

Method Signature

recognize_google_cloud(
    audio_data: AudioData,
    credentials_json_path: str | None = None,
    language_code: str = "en-US",
    preferred_phrases: list[str] | None = None,
    show_all: bool = False,
    model: str | None = None,
    use_enhanced: bool | None = None
) -> str | RecognizeResponse

Parameters

audio_data

AudioData

required

The audio data to recognize. Must be an AudioData instance.

credentials_json_path

str | None

default:"None"

Path to the JSON file containing Google Cloud API credentials.If not specified, the library will try to automatically find the default API credentials using Application Default Credentials (ADC).To create credentials:

Create a Google Cloud Platform project
Enable the Speech-to-Text API
Create a service account
Download the JSON key file

language_code

str

default:"en-US"

Recognition language as a BCP-47 language tag (e.g., "en-US", "es-ES", "ja-JP").See supported languages for the complete list.

preferred_phrases

list[str] | None

default:"None"

List of phrases that are more likely to be recognized. Useful for:

Domain-specific vocabulary
Proper nouns (names, places, brands)
Keywords or commands

Note: The API imposes restrictions on phrase lists.

show_all

bool

default:"False"

If True, returns the full RecognizeResponse object with word-level timestamps and confidence scores. If False, returns only the transcript text.

model

str | None

default:"None"

Speech recognition model to use. Options include:

"default" - Standard model
"command_and_search" - Optimized for short queries
"phone_call" - Optimized for phone audio
"video" - Optimized for video audio
"medical_dictation" - Medical terminology
"medical_conversation" - Medical conversations

See RecognitionConfig documentation for details.

use_enhanced

bool | None

default:"None"

Set to True to use an enhanced model for better accuracy. May incur additional costs.

Returns

transcript

str

The recognized text when show_all=False

response

RecognizeResponse

The full API response when show_all=True, containing:

results: List of speech recognition results
alternatives: Multiple transcription alternatives with confidence scores
words: Word-level timing and confidence information

Exceptions

UnknownValueError

Exception

Raised when the speech is unintelligible

RequestError

Exception

Raised when:

The API request fails
Credentials are invalid or missing
The google-cloud-speech module is not installed
There is no internet connection

Example Usage

Basic Recognition

import speech_recognition as sr

# Initialize recognizer
r = sr.Recognizer()

# Record audio
with sr.Microphone() as source:
    print("Say something!")
    audio = r.listen(source)

# Recognize with Google Cloud
try:
    text = r.recognize_google_cloud(
        audio,
        credentials_json_path="path/to/credentials.json"
    )
    print(f"You said: {text}")
except sr.UnknownValueError:
    print("Could not understand audio")
except sr.RequestError as e:
    print(f"API error: {e}")

Using Application Default Credentials

import speech_recognition as sr
import os

# Set environment variable for credentials
os.environ["GOOGLE_APPLICATION_CREDENTIALS"] = "/path/to/credentials.json"

r = sr.Recognizer()

with sr.Microphone() as source:
    audio = r.listen(source)

# No need to specify credentials_json_path
try:
    text = r.recognize_google_cloud(audio)
    print(f"You said: {text}")
except sr.RequestError as e:
    print(f"Error: {e}")

With Preferred Phrases

import speech_recognition as sr

r = sr.Recognizer()

with sr.Microphone() as source:
    audio = r.listen(source)

# Boost recognition of specific terms
preferred_phrases = [
    "TensorFlow",
    "Kubernetes",
    "microservices",
    "API gateway"
]

try:
    text = r.recognize_google_cloud(
        audio,
        credentials_json_path="credentials.json",
        preferred_phrases=preferred_phrases
    )
    print(f"Transcript: {text}")
except sr.UnknownValueError:
    print("Could not understand audio")

With Enhanced Model

import speech_recognition as sr

r = sr.Recognizer()

with sr.Microphone() as source:
    audio = r.listen(source)

try:
    text = r.recognize_google_cloud(
        audio,
        credentials_json_path="credentials.json",
        model="phone_call",
        use_enhanced=True
    )
    print(f"Transcript: {text}")
except sr.RequestError as e:
    print(f"Error: {e}")

Getting Full Response with Timestamps

import speech_recognition as sr

r = sr.Recognizer()

with sr.Microphone() as source:
    audio = r.listen(source)

try:
    # Get full response with word-level details
    response = r.recognize_google_cloud(
        audio,
        credentials_json_path="credentials.json",
        show_all=True
    )
    
    # Process results
    for result in response.results:
        alternative = result.alternatives[0]
        print(f"Transcript: {alternative.transcript}")
        print(f"Confidence: {alternative.confidence}")
        
        # Word-level timing
        for word_info in alternative.words:
            word = word_info.word
            start_time = word_info.start_time.total_seconds()
            end_time = word_info.end_time.total_seconds()
            print(f"  {word}: {start_time:.2f}s - {end_time:.2f}s")
except sr.UnknownValueError:
    print("Could not understand audio")

Multiple Languages

import speech_recognition as sr

r = sr.Recognizer()

# Spanish recognition
with sr.Microphone() as source:
    print("Habla ahora...")
    audio = r.listen(source)

try:
    text = r.recognize_google_cloud(
        audio,
        credentials_json_path="credentials.json",
        language_code="es-ES"
    )
    print(f"Dijiste: {text}")
except sr.RequestError as e:
    print(f"Error: {e}")

Setup Instructions

1. Install the Google Cloud Library

pip install google-cloud-speech

2. Create a Google Cloud Project

Go to Google Cloud Console
Create a new project or select an existing one
Enable the Speech-to-Text API
Enable billing for the project

3. Create Service Account Credentials

Go to IAM & Admin > Service Accounts
Click Create Service Account
Give it a name and grant the Speech-to-Text API User role
Click Create Key and choose JSON
Save the downloaded JSON file securely

4. Use the Credentials

Option A: Pass the file path directly

text = r.recognize_google_cloud(
    audio,
    credentials_json_path="/path/to/credentials.json"
)

Option B: Use Application Default Credentials

export GOOGLE_APPLICATION_CREDENTIALS="/path/to/credentials.json"

Then in Python:

text = r.recognize_google_cloud(audio)  # No path needed

Language Support

Google Cloud Speech-to-Text supports 125+ languages and variants:

en-US - English (United States)
en-GB - English (United Kingdom)
es-ES - Spanish (Spain)
fr-FR - French (France)
de-DE - German (Germany)
ja-JP - Japanese
zh-CN - Chinese (Simplified)
ko-KR - Korean
pt-BR - Portuguese (Brazil)
hi-IN - Hindi (India)
ar-SA - Arabic

See the full list of supported languages.

Notes

Requires a Google Cloud Platform account with billing enabled
Audio sample rate must be between 8 kHz and 48 kHz
Audio is automatically converted to 16-bit samples
Pricing is based on audio duration processed
Enhanced models cost more but provide better accuracy
Word-level timestamps require show_all=True

Core Classes

Recognition Methods

Exceptions

Overview

Method Signature

Parameters

Returns

Exceptions

Example Usage

Basic Recognition

Using Application Default Credentials

With Preferred Phrases

With Enhanced Model

Getting Full Response with Timestamps

Multiple Languages

Setup Instructions

1. Install the Google Cloud Library

2. Create a Google Cloud Project

3. Create Service Account Credentials

4. Use the Credentials

Language Support

Notes

Core Classes

Recognition Methods

Exceptions

​Overview

​Method Signature

​Parameters

​Returns

​Exceptions

​Example Usage

​Basic Recognition

​Using Application Default Credentials

​With Preferred Phrases

​With Enhanced Model

​Getting Full Response with Timestamps

​Multiple Languages

​Setup Instructions

​1. Install the Google Cloud Library

​2. Create a Google Cloud Project

​3. Create Service Account Credentials

​4. Use the Credentials

​Language Support

​Notes

Overview

Method Signature

Parameters

Returns

Exceptions

Example Usage

Basic Recognition

Using Application Default Credentials

With Preferred Phrases

With Enhanced Model

Getting Full Response with Timestamps

Multiple Languages

Setup Instructions

1. Install the Google Cloud Library

2. Create a Google Cloud Project

3. Create Service Account Credentials

4. Use the Credentials

Language Support

Notes