Overview
Performs speech recognition using the Microsoft Azure Speech API (Cognitive Services). Provides high-quality speech recognition with support for custom profanity filtering and multiple languages.Method Signature
Parameters
The audio data to recognize. Must be an
AudioData instance.Microsoft Azure Speech API key (32-character lowercase hexadecimal string).See setup instructions below for how to obtain an API key.
Recognition language as a BCP-47 language tag (e.g.,
"en-US", "fr-FR", "de-DE").See supported languages for the complete list.Profanity filter mode:
"masked"- Replace profanity with asterisks"removed"- Remove profanity from results"raw"- No filtering
Azure region where your Speech resource is deployed (e.g.,
"westus", "eastus", "westeurope").Must match the region where you created your Speech resource.If
True, returns the raw API response as a JSON dictionary. If False, returns a tuple of (transcript, confidence).Returns
When
show_all=False, returns (transcript, confidence) where:transcript: The recognized textconfidence: Confidence score between 0 and 1
When
show_all=True, returns the raw API response containing:RecognitionStatus: Status of recognition (“Success”, “NoMatch”, etc.)NBest: List of recognition results with confidence scoresDisplay: Formatted display text
Exceptions
Raised when the speech is unintelligible
Raised when:
- The API request fails
- The API key is invalid
- The specified location is incorrect
- There is no internet connection
Example Usage
Basic Recognition
With Custom Region
With Different Languages
With Profanity Filtering
Getting Full API Response
From Audio File
Using Environment Variables
Setup Instructions
1. Create Azure Account
- Sign up for Microsoft Azure
- If new, you may get free credits
2. Create Speech Resource
- Go to Azure Portal
- Click Create a resource
- Search for “Speech”
- Click Create
- Fill in the form:
- Subscription: Select your subscription
- Resource group: Create new or use existing
- Region: Choose a region (e.g., West US, East US)
- Name: Give your resource a name
- Pricing tier: Select a tier (F0 for free tier)
- Click Review + Create, then Create
3. Get API Key and Region
- Go to your Speech resource
- Click Keys and Endpoint in the left menu
- Copy Key 1 or Key 2 (both work)
- Note the Location/Region (e.g.,
westus,eastus)
4. Use in Code
Available Regions
Common Azure regions:- Americas:
westus,westus2,eastus,eastus2,centralus,brazilsouth - Europe:
westeurope,northeurope,uksouth,francecentral - Asia Pacific:
southeastasia,eastasia,japaneast,australiaeast,centralindia
Language Support
Supports 100+ languages including:en-US- English (United States)en-GB- English (United Kingdom)es-ES- Spanish (Spain)fr-FR- French (France)de-DE- German (Germany)it-IT- Italian (Italy)ja-JP- Japanesezh-CN- Chinese (Simplified)ko-KR- Koreanpt-BR- Portuguese (Brazil)ru-RU- Russianar-SA- Arabic
Pricing
- Free tier (F0): 5 audio hours per month
- Standard tier (S0): Pay-per-use pricing
Notes
- Requires internet connection
- Audio is automatically converted to 16 kHz, 16-bit samples
- Access tokens are cached for 10 minutes to reduce overhead
- Returns both transcript and confidence score
- Supports real-time and batch transcription
- The
locationparameter must match your resource’s region