Microsoft Azure Speech

Microsoft Azure Speech is an enterprise-grade cloud speech recognition service offering real-time transcription, custom models, and support for over 100 languages. It’s ideal for production applications requiring high reliability and advanced features.

Method Signature

recognize_azure(
    audio_data: AudioData,
    key: str,
    language: str = "en-US",
    profanity: str = "masked",
    location: str = "westus",
    show_all: bool = False
) -> str | tuple[str, float] | dict

Parameters

audio_data

AudioData

required

An AudioData instance containing the audio to transcribe.

key

str

required

Azure Speech API subscription key. Required for authentication.See Getting an API Key for instructions.

language

str

default:"en-US"

Recognition language as a BCP-47 language tag (e.g., "en-US", "fr-FR", "ja-JP").See supported languages.

profanity

str

default:"masked"

Profanity filtering mode:

"masked": Replace profanity with asterisks
"removed": Remove profanity from results
"raw": No filtering

location

str

default:"westus"

Azure region where your Speech resource is deployed.Common regions: "eastus", "westus", "westus2", "northeurope", "westeurope", "southeastasia"

show_all

bool

default:"False"

If True, returns the full API response. If False, returns a tuple of (transcription, confidence).

Returns

Default: tuple[str, float] - Transcription text and confidence score (0.0 to 1.0)
With show_all=True: dict - Full API response with all recognition details

Getting an API Key

Create Azure Account

Create Speech Resource

Go to the Azure Portal
Click “Create a resource”
Search for “Speech” or “Cognitive Services”
Click “Create”
Fill in the form:
- Name: Your resource name
- Subscription: Select your subscription
- Location: Choose a region near you
- Pricing tier: F0 (free) or S0 (paid)
Click “Review + create” then “Create”

Get API Key

Navigate to your Speech resource
Click “Keys and Endpoint” in the left menu
Copy Key 1 or Key 2 (either works)
Note the Location/Region (you’ll need this too)

Azure Speech API keys are 32-character lowercase hexadecimal strings.

Basic Example

import speech_recognition as sr

# Your Azure Speech credentials
AZURE_KEY = "your_azure_speech_api_key"
AZURE_LOCATION = "westus"  # Or your resource location

r = sr.Recognizer()

with sr.AudioFile("audio.wav") as source:
    audio = r.record(source)

try:
    text, confidence = r.recognize_azure(
        audio,
        key=AZURE_KEY,
        location=AZURE_LOCATION
    )
    print(f"Transcription: {text}")
    print(f"Confidence: {confidence:.2%}")
    
except sr.UnknownValueError:
    print("Azure could not understand audio")
except sr.RequestError as e:
    print(f"Could not request results; {e}")

Microphone Example

import speech_recognition as sr

AZURE_KEY = "your_azure_speech_api_key"
AZURE_LOCATION = "westus"

r = sr.Recognizer()

with sr.Microphone() as source:
    print("Speak now...")
    audio = r.listen(source)

print("Transcribing...")
text, confidence = r.recognize_azure(
    audio,
    key=AZURE_KEY,
    location=AZURE_LOCATION
)

print(f"You said: {text}")
print(f"Confidence: {confidence:.2%}")

Language Support

Azure Speech supports over 100 languages and dialects.

Common Languages
Regional Variants

import speech_recognition as sr

r = sr.Recognizer()

with sr.AudioFile("audio.wav") as source:
    audio = r.record(source)

# English (US)
text, conf = r.recognize_azure(audio, key=KEY, language="en-US")

# Spanish (Spain)
text, conf = r.recognize_azure(audio, key=KEY, language="es-ES")

# French (France)
text, conf = r.recognize_azure(audio, key=KEY, language="fr-FR")

# German (Germany)
text, conf = r.recognize_azure(audio, key=KEY, language="de-DE")

# Japanese
text, conf = r.recognize_azure(audio, key=KEY, language="ja-JP")

# Chinese (Mandarin, Simplified)
text, conf = r.recognize_azure(audio, key=KEY, language="zh-CN")

# English variants
r.recognize_azure(audio, key=KEY, language="en-US")  # United States
r.recognize_azure(audio, key=KEY, language="en-GB")  # United Kingdom
r.recognize_azure(audio, key=KEY, language="en-AU")  # Australia
r.recognize_azure(audio, key=KEY, language="en-CA")  # Canada
r.recognize_azure(audio, key=KEY, language="en-IN")  # India

# Spanish variants
r.recognize_azure(audio, key=KEY, language="es-ES")  # Spain
r.recognize_azure(audio, key=KEY, language="es-MX")  # Mexico
r.recognize_azure(audio, key=KEY, language="es-AR")  # Argentina

# Portuguese variants
r.recognize_azure(audio, key=KEY, language="pt-BR")  # Brazil
r.recognize_azure(audio, key=KEY, language="pt-PT")  # Portugal

For a complete list, see Azure’s language support documentation.

Profanity Filtering

import speech_recognition as sr

AZURE_KEY = "your_key"
r = sr.Recognizer()

with sr.AudioFile("audio.wav") as source:
    audio = r.record(source)

# Masked (default) - replaces profanity with asterisks
text, _ = r.recognize_azure(audio, key=AZURE_KEY, profanity="masked")
print(text)  # "What the ****"

# Removed - removes profanity entirely
text, _ = r.recognize_azure(audio, key=AZURE_KEY, profanity="removed")
print(text)  # "What the"

# Raw - no filtering
text, _ = r.recognize_azure(audio, key=AZURE_KEY, profanity="raw")
print(text)  # "What the hell"

Azure Regions

Choose a region close to your users for lower latency:

import speech_recognition as sr

r = sr.Recognizer()

with sr.AudioFile("audio.wav") as source:
    audio = r.record(source)

# US regions
text, _ = r.recognize_azure(audio, key=KEY, location="eastus")
text, _ = r.recognize_azure(audio, key=KEY, location="westus")
text, _ = r.recognize_azure(audio, key=KEY, location="westus2")

# Europe regions
text, _ = r.recognize_azure(audio, key=KEY, location="northeurope")
text, _ = r.recognize_azure(audio, key=KEY, location="westeurope")

# Asia regions
text, _ = r.recognize_azure(audio, key=KEY, location="southeastasia")
text, _ = r.recognize_azure(audio, key=KEY, location="eastasia")

The location parameter must match the region where you created your Azure Speech resource.

Full Response

import speech_recognition as sr
import json

AZURE_KEY = "your_key"
r = sr.Recognizer()

with sr.AudioFile("audio.wav") as source:
    audio = r.record(source)

# Get full response
response = r.recognize_azure(
    audio,
    key=AZURE_KEY,
    location="westus",
    show_all=True
)

print(json.dumps(response, indent=2))

# Access specific fields
for result in response.get("NBest", []):
    print(f"Text: {result['Display']}")
    print(f"Confidence: {result['Confidence']:.2%}")

Error Handling

import speech_recognition as sr

AZURE_KEY = "your_key"
AZURE_LOCATION = "westus"

r = sr.Recognizer()

with sr.AudioFile("audio.wav") as source:
    audio = r.record(source)

try:
    text, confidence = r.recognize_azure(
        audio,
        key=AZURE_KEY,
        location=AZURE_LOCATION
    )
    print(f"Transcription: {text}")
    
except sr.UnknownValueError:
    # Speech was unintelligible
    print("Could not understand the audio")
    
except sr.RequestError as e:
    # API request failed
    if "invalid key" in str(e).lower():
        print("Invalid API key")
    elif "connection" in str(e).lower():
        print("Network connection error")
    else:
        print(f"API error: {e}")

Audio Requirements

Sample Rate: 16 kHz (automatically converted)
Sample Width: 16-bit (automatically converted)
Channels: Mono (stereo is automatically converted)
Format: Converted to WAV with PCM encoding

Timeouts

import speech_recognition as sr

r = sr.Recognizer()
r.operation_timeout = 15  # Wait up to 15 seconds

with sr.AudioFile("audio.wav") as source:
    audio = r.record(source)

try:
    text, _ = r.recognize_azure(audio, key=AZURE_KEY)
    print(text)
except sr.WaitTimeoutError:
    print("Request timed out")

Using Environment Variables

import speech_recognition as sr
import os

# Store credentials in environment variables
AZURE_KEY = os.environ.get("AZURE_SPEECH_KEY")
AZURE_LOCATION = os.environ.get("AZURE_SPEECH_LOCATION", "westus")

if not AZURE_KEY:
    raise ValueError("AZURE_SPEECH_KEY environment variable not set")

r = sr.Recognizer()

with sr.AudioFile("audio.wav") as source:
    audio = r.record(source)

text, confidence = r.recognize_azure(
    audio,
    key=AZURE_KEY,
    location=AZURE_LOCATION
)
print(text)

Pricing

Pricing Tiers:

Free (F0): 5 audio hours per month
Standard (S0): $1 per audio hour

Check Azure Speech pricing for current rates.

Advanced Features

For advanced features not available in recognize_azure(), consider using the Azure Speech SDK directly:

Streaming recognition: Real-time transcription
Speaker diarization: Identify who said what
Custom models: Train models for domain-specific terminology
Pronunciation assessment: Evaluate pronunciation for language learning
Intent recognition: Combine speech recognition with LUIS

See the Azure Speech SDK documentation for details.

Best Practices

For production applications:

Use environment variables for credentials (never hardcode keys)
Implement retry logic for transient failures
Monitor your API usage in the Azure Portal
Use the region closest to your users
Implement proper error handling
Cache the OAuth token (done automatically by the library)

Security:

Never commit API keys to version control
Rotate keys periodically
Use Azure Key Vault for production deployments
Implement rate limiting to prevent abuse

Comparison: Azure vs Other Services

Feature	Azure Speech	Google	Whisper (local)
Accuracy	High	High	Very High
Languages	100+	100+	99
Real-time	Yes (SDK)	Yes (SDK)	No
Custom models	Yes	Yes	No
Privacy	Cloud	Cloud	Local
Pricing	Pay-per-use	Free tier + paid	Free
Setup complexity	Medium	Low	Low

Getting Started

Core Concepts

Recognition Engines

Guides

Examples

Method Signature

Parameters

Returns

Getting an API Key

Basic Example

Microphone Example

Language Support

Profanity Filtering

Azure Regions

Full Response

Error Handling

Audio Requirements

Timeouts

Using Environment Variables

Pricing

Advanced Features

Best Practices

Comparison: Azure vs Other Services

Getting Started

Core Concepts

Recognition Engines

Guides

Examples

​Method Signature

​Parameters

​Returns

​Getting an API Key

​Basic Example

​Microphone Example

​Language Support

​Profanity Filtering

​Azure Regions

​Full Response

​Error Handling

​Audio Requirements

​Timeouts

​Using Environment Variables

​Pricing

​Advanced Features

​Best Practices

​Comparison: Azure vs Other Services

​Related Resources

Method Signature

Parameters

Returns

Getting an API Key

Basic Example

Microphone Example

Language Support

Profanity Filtering

Azure Regions

Full Response

Error Handling

Audio Requirements

Timeouts

Using Environment Variables

Pricing

Advanced Features

Best Practices

Comparison: Azure vs Other Services

Related Resources