Overview
Performs speech recognition using the Google Cloud Speech-to-Text V1 API. This is the enterprise-grade version with more features and better accuracy than the basic Google Speech Recognition API.Method Signature
Parameters
The audio data to recognize. Must be an
AudioData instance.Path to the JSON file containing Google Cloud API credentials.If not specified, the library will try to automatically find the default API credentials using Application Default Credentials (ADC).To create credentials:
- Create a Google Cloud Platform project
- Enable the Speech-to-Text API
- Create a service account
- Download the JSON key file
Recognition language as a BCP-47 language tag (e.g.,
"en-US", "es-ES", "ja-JP").See supported languages for the complete list.List of phrases that are more likely to be recognized. Useful for:
- Domain-specific vocabulary
- Proper nouns (names, places, brands)
- Keywords or commands
If
True, returns the full RecognizeResponse object with word-level timestamps and confidence scores. If False, returns only the transcript text.Speech recognition model to use. Options include:
"default"- Standard model"command_and_search"- Optimized for short queries"phone_call"- Optimized for phone audio"video"- Optimized for video audio"medical_dictation"- Medical terminology"medical_conversation"- Medical conversations
Set to
True to use an enhanced model for better accuracy. May incur additional costs.Returns
The recognized text when
show_all=FalseThe full API response when
show_all=True, containing:results: List of speech recognition resultsalternatives: Multiple transcription alternatives with confidence scoreswords: Word-level timing and confidence information
Exceptions
Raised when the speech is unintelligible
Raised when:
- The API request fails
- Credentials are invalid or missing
- The
google-cloud-speechmodule is not installed - There is no internet connection
Example Usage
Basic Recognition
Using Application Default Credentials
With Preferred Phrases
With Enhanced Model
Getting Full Response with Timestamps
Multiple Languages
Setup Instructions
1. Install the Google Cloud Library
2. Create a Google Cloud Project
- Go to Google Cloud Console
- Create a new project or select an existing one
- Enable the Speech-to-Text API
- Enable billing for the project
3. Create Service Account Credentials
- Go to IAM & Admin > Service Accounts
- Click Create Service Account
- Give it a name and grant the Speech-to-Text API User role
- Click Create Key and choose JSON
- Save the downloaded JSON file securely
4. Use the Credentials
Option A: Pass the file path directlyLanguage Support
Google Cloud Speech-to-Text supports 125+ languages and variants:en-US- English (United States)en-GB- English (United Kingdom)es-ES- Spanish (Spain)fr-FR- French (France)de-DE- German (Germany)ja-JP- Japanesezh-CN- Chinese (Simplified)ko-KR- Koreanpt-BR- Portuguese (Brazil)hi-IN- Hindi (India)ar-SA- Arabic
Notes
- Requires a Google Cloud Platform account with billing enabled
- Audio sample rate must be between 8 kHz and 48 kHz
- Audio is automatically converted to 16-bit samples
- Pricing is based on audio duration processed
- Enhanced models cost more but provide better accuracy
- Word-level timestamps require
show_all=True