Skip to main content
Microsoft Azure Speech is an enterprise-grade cloud speech recognition service offering real-time transcription, custom models, and support for over 100 languages. It’s ideal for production applications requiring high reliability and advanced features.

Method Signature

recognize_azure(
    audio_data: AudioData,
    key: str,
    language: str = "en-US",
    profanity: str = "masked",
    location: str = "westus",
    show_all: bool = False
) -> str | tuple[str, float] | dict

Parameters

audio_data
AudioData
required
An AudioData instance containing the audio to transcribe.
key
str
required
Azure Speech API subscription key. Required for authentication.See Getting an API Key for instructions.
language
str
default:"en-US"
Recognition language as a BCP-47 language tag (e.g., "en-US", "fr-FR", "ja-JP").See supported languages.
profanity
str
default:"masked"
Profanity filtering mode:
  • "masked": Replace profanity with asterisks
  • "removed": Remove profanity from results
  • "raw": No filtering
location
str
default:"westus"
Azure region where your Speech resource is deployed.Common regions: "eastus", "westus", "westus2", "northeurope", "westeurope", "southeastasia"
show_all
bool
default:"False"
If True, returns the full API response. If False, returns a tuple of (transcription, confidence).

Returns

  • Default: tuple[str, float] - Transcription text and confidence score (0.0 to 1.0)
  • With show_all=True: dict - Full API response with all recognition details

Getting an API Key

1

Create Azure Account

Sign up for a Microsoft Azure account if you don’t have one. New accounts get free credits.
2

Create Speech Resource

  1. Go to the Azure Portal
  2. Click “Create a resource”
  3. Search for “Speech” or “Cognitive Services”
  4. Click “Create”
  5. Fill in the form:
    • Name: Your resource name
    • Subscription: Select your subscription
    • Location: Choose a region near you
    • Pricing tier: F0 (free) or S0 (paid)
  6. Click “Review + create” then “Create”
3

Get API Key

  1. Navigate to your Speech resource
  2. Click “Keys and Endpoint” in the left menu
  3. Copy Key 1 or Key 2 (either works)
  4. Note the Location/Region (you’ll need this too)
Azure Speech API keys are 32-character lowercase hexadecimal strings.

Basic Example

import speech_recognition as sr

# Your Azure Speech credentials
AZURE_KEY = "your_azure_speech_api_key"
AZURE_LOCATION = "westus"  # Or your resource location

r = sr.Recognizer()

with sr.AudioFile("audio.wav") as source:
    audio = r.record(source)

try:
    text, confidence = r.recognize_azure(
        audio,
        key=AZURE_KEY,
        location=AZURE_LOCATION
    )
    print(f"Transcription: {text}")
    print(f"Confidence: {confidence:.2%}")
    
except sr.UnknownValueError:
    print("Azure could not understand audio")
except sr.RequestError as e:
    print(f"Could not request results; {e}")

Microphone Example

import speech_recognition as sr

AZURE_KEY = "your_azure_speech_api_key"
AZURE_LOCATION = "westus"

r = sr.Recognizer()

with sr.Microphone() as source:
    print("Speak now...")
    audio = r.listen(source)

print("Transcribing...")
text, confidence = r.recognize_azure(
    audio,
    key=AZURE_KEY,
    location=AZURE_LOCATION
)

print(f"You said: {text}")
print(f"Confidence: {confidence:.2%}")

Language Support

Azure Speech supports over 100 languages and dialects.
import speech_recognition as sr

r = sr.Recognizer()

with sr.AudioFile("audio.wav") as source:
    audio = r.record(source)

# English (US)
text, conf = r.recognize_azure(audio, key=KEY, language="en-US")

# Spanish (Spain)
text, conf = r.recognize_azure(audio, key=KEY, language="es-ES")

# French (France)
text, conf = r.recognize_azure(audio, key=KEY, language="fr-FR")

# German (Germany)
text, conf = r.recognize_azure(audio, key=KEY, language="de-DE")

# Japanese
text, conf = r.recognize_azure(audio, key=KEY, language="ja-JP")

# Chinese (Mandarin, Simplified)
text, conf = r.recognize_azure(audio, key=KEY, language="zh-CN")
For a complete list, see Azure’s language support documentation.

Profanity Filtering

import speech_recognition as sr

AZURE_KEY = "your_key"
r = sr.Recognizer()

with sr.AudioFile("audio.wav") as source:
    audio = r.record(source)

# Masked (default) - replaces profanity with asterisks
text, _ = r.recognize_azure(audio, key=AZURE_KEY, profanity="masked")
print(text)  # "What the ****"

# Removed - removes profanity entirely
text, _ = r.recognize_azure(audio, key=AZURE_KEY, profanity="removed")
print(text)  # "What the"

# Raw - no filtering
text, _ = r.recognize_azure(audio, key=AZURE_KEY, profanity="raw")
print(text)  # "What the hell"

Azure Regions

Choose a region close to your users for lower latency:
import speech_recognition as sr

r = sr.Recognizer()

with sr.AudioFile("audio.wav") as source:
    audio = r.record(source)

# US regions
text, _ = r.recognize_azure(audio, key=KEY, location="eastus")
text, _ = r.recognize_azure(audio, key=KEY, location="westus")
text, _ = r.recognize_azure(audio, key=KEY, location="westus2")

# Europe regions
text, _ = r.recognize_azure(audio, key=KEY, location="northeurope")
text, _ = r.recognize_azure(audio, key=KEY, location="westeurope")

# Asia regions
text, _ = r.recognize_azure(audio, key=KEY, location="southeastasia")
text, _ = r.recognize_azure(audio, key=KEY, location="eastasia")
The location parameter must match the region where you created your Azure Speech resource.

Full Response

import speech_recognition as sr
import json

AZURE_KEY = "your_key"
r = sr.Recognizer()

with sr.AudioFile("audio.wav") as source:
    audio = r.record(source)

# Get full response
response = r.recognize_azure(
    audio,
    key=AZURE_KEY,
    location="westus",
    show_all=True
)

print(json.dumps(response, indent=2))

# Access specific fields
for result in response.get("NBest", []):
    print(f"Text: {result['Display']}")
    print(f"Confidence: {result['Confidence']:.2%}")

Error Handling

import speech_recognition as sr

AZURE_KEY = "your_key"
AZURE_LOCATION = "westus"

r = sr.Recognizer()

with sr.AudioFile("audio.wav") as source:
    audio = r.record(source)

try:
    text, confidence = r.recognize_azure(
        audio,
        key=AZURE_KEY,
        location=AZURE_LOCATION
    )
    print(f"Transcription: {text}")
    
except sr.UnknownValueError:
    # Speech was unintelligible
    print("Could not understand the audio")
    
except sr.RequestError as e:
    # API request failed
    if "invalid key" in str(e).lower():
        print("Invalid API key")
    elif "connection" in str(e).lower():
        print("Network connection error")
    else:
        print(f"API error: {e}")

Audio Requirements

  • Sample Rate: 16 kHz (automatically converted)
  • Sample Width: 16-bit (automatically converted)
  • Channels: Mono (stereo is automatically converted)
  • Format: Converted to WAV with PCM encoding

Timeouts

import speech_recognition as sr

r = sr.Recognizer()
r.operation_timeout = 15  # Wait up to 15 seconds

with sr.AudioFile("audio.wav") as source:
    audio = r.record(source)

try:
    text, _ = r.recognize_azure(audio, key=AZURE_KEY)
    print(text)
except sr.WaitTimeoutError:
    print("Request timed out")

Using Environment Variables

import speech_recognition as sr
import os

# Store credentials in environment variables
AZURE_KEY = os.environ.get("AZURE_SPEECH_KEY")
AZURE_LOCATION = os.environ.get("AZURE_SPEECH_LOCATION", "westus")

if not AZURE_KEY:
    raise ValueError("AZURE_SPEECH_KEY environment variable not set")

r = sr.Recognizer()

with sr.AudioFile("audio.wav") as source:
    audio = r.record(source)

text, confidence = r.recognize_azure(
    audio,
    key=AZURE_KEY,
    location=AZURE_LOCATION
)
print(text)

Pricing

Pricing Tiers:
  • Free (F0): 5 audio hours per month
  • Standard (S0): $1 per audio hour
Check Azure Speech pricing for current rates.

Advanced Features

For advanced features not available in recognize_azure(), consider using the Azure Speech SDK directly:
  • Streaming recognition: Real-time transcription
  • Speaker diarization: Identify who said what
  • Custom models: Train models for domain-specific terminology
  • Pronunciation assessment: Evaluate pronunciation for language learning
  • Intent recognition: Combine speech recognition with LUIS
See the Azure Speech SDK documentation for details.

Best Practices

For production applications:
  • Use environment variables for credentials (never hardcode keys)
  • Implement retry logic for transient failures
  • Monitor your API usage in the Azure Portal
  • Use the region closest to your users
  • Implement proper error handling
  • Cache the OAuth token (done automatically by the library)
Security:
  • Never commit API keys to version control
  • Rotate keys periodically
  • Use Azure Key Vault for production deployments
  • Implement rate limiting to prevent abuse

Comparison: Azure vs Other Services

FeatureAzure SpeechGoogleWhisper (local)
AccuracyHighHighVery High
Languages100+100+99
Real-timeYes (SDK)Yes (SDK)No
Custom modelsYesYesNo
PrivacyCloudCloudLocal
PricingPay-per-useFree tier + paidFree
Setup complexityMediumLowLow