recognize_azure()

Overview

Performs speech recognition using the Microsoft Azure Speech API (Cognitive Services). Provides high-quality speech recognition with support for custom profanity filtering and multiple languages.

Method Signature

recognize_azure(
    audio_data: AudioData,
    key: str,
    language: str = "en-US",
    profanity: str = "masked",
    location: str = "westus",
    show_all: bool = False
) -> str | tuple[str, float] | dict

Parameters

audio_data

AudioData

required

The audio data to recognize. Must be an AudioData instance.

key

str

required

Microsoft Azure Speech API key (32-character lowercase hexadecimal string).See setup instructions below for how to obtain an API key.

language

str

default:"en-US"

Recognition language as a BCP-47 language tag (e.g., "en-US", "fr-FR", "de-DE").See supported languages for the complete list.

profanity

str

default:"masked"

Profanity filter mode:

"masked" - Replace profanity with asterisks
"removed" - Remove profanity from results
"raw" - No filtering

location

str

default:"westus"

Azure region where your Speech resource is deployed (e.g., "westus", "eastus", "westeurope").Must match the region where you created your Speech resource.

show_all

bool

default:"False"

If True, returns the raw API response as a JSON dictionary. If False, returns a tuple of (transcript, confidence).

Returns

result

tuple[str, float]

When show_all=False, returns (transcript, confidence) where:

transcript: The recognized text
confidence: Confidence score between 0 and 1

response

dict

When show_all=True, returns the raw API response containing:

RecognitionStatus: Status of recognition (“Success”, “NoMatch”, etc.)
NBest: List of recognition results with confidence scores
Display: Formatted display text

Exceptions

UnknownValueError

Exception

Raised when the speech is unintelligible

RequestError

Exception

Raised when:

The API request fails
The API key is invalid
The specified location is incorrect
There is no internet connection

Example Usage

Basic Recognition

import speech_recognition as sr

# Initialize recognizer
r = sr.Recognizer()

# Your Azure Speech API key
AZURE_KEY = "your-32-character-api-key"

# Record audio
with sr.Microphone() as source:
    print("Say something!")
    audio = r.listen(source)

# Recognize with Azure
try:
    text, confidence = r.recognize_azure(audio, key=AZURE_KEY)
    print(f"You said: {text}")
    print(f"Confidence: {confidence:.2%}")
except sr.UnknownValueError:
    print("Could not understand audio")
except sr.RequestError as e:
    print(f"API error: {e}")

With Custom Region

import speech_recognition as sr

AZURE_KEY = "your-api-key"

r = sr.Recognizer()

with sr.Microphone() as source:
    audio = r.listen(source)

try:
    # Specify the region where your resource is deployed
    text, confidence = r.recognize_azure(
        audio,
        key=AZURE_KEY,
        location="eastus"  # or "westeurope", "southeastasia", etc.
    )
    print(f"Transcript: {text}")
except sr.RequestError as e:
    print(f"Error: {e}")

With Different Languages

import speech_recognition as sr

AZURE_KEY = "your-api-key"

r = sr.Recognizer()

# German recognition
with sr.Microphone() as source:
    print("Sprechen Sie jetzt...")
    audio = r.listen(source)

try:
    text, confidence = r.recognize_azure(
        audio,
        key=AZURE_KEY,
        language="de-DE"
    )
    print(f"Sie sagten: {text}")
    print(f"Konfidenz: {confidence:.2%}")
except sr.UnknownValueError:
    print("Audio nicht verstanden")

With Profanity Filtering

import speech_recognition as sr

AZURE_KEY = "your-api-key"

r = sr.Recognizer()

with sr.Microphone() as source:
    audio = r.listen(source)

try:
    # Remove profanity completely
    text, confidence = r.recognize_azure(
        audio,
        key=AZURE_KEY,
        profanity="removed"
    )
    print(f"Transcript: {text}")
except sr.RequestError as e:
    print(f"Error: {e}")

# Or get raw text without filtering
try:
    text, confidence = r.recognize_azure(
        audio,
        key=AZURE_KEY,
        profanity="raw"
    )
    print(f"Raw transcript: {text}")
except sr.RequestError as e:
    print(f"Error: {e}")

Getting Full API Response

import speech_recognition as sr
import json

AZURE_KEY = "your-api-key"

r = sr.Recognizer()

with sr.Microphone() as source:
    audio = r.listen(source)

try:
    # Get complete response
    response = r.recognize_azure(
        audio,
        key=AZURE_KEY,
        show_all=True
    )
    
    print(json.dumps(response, indent=2))
    
    # Access multiple alternatives
    if "NBest" in response:
        for i, result in enumerate(response["NBest"]):
            print(f"Alternative {i+1}:")
            print(f"  Text: {result['Display']}")
            print(f"  Confidence: {result['Confidence']:.2%}")
except sr.UnknownValueError:
    print("Could not understand audio")

From Audio File

import speech_recognition as sr

AZURE_KEY = "your-api-key"

r = sr.Recognizer()

# Load and transcribe audio file
with sr.AudioFile("speech.wav") as source:
    audio = r.record(source)

try:
    text, confidence = r.recognize_azure(audio, key=AZURE_KEY)
    print(f"Transcript: {text}")
    print(f"Confidence: {confidence:.2%}")
except sr.RequestError as e:
    print(f"Error: {e}")

Using Environment Variables

import speech_recognition as sr
import os

# Store API key in environment variable
AZURE_KEY = os.getenv("AZURE_SPEECH_KEY")
AZURE_REGION = os.getenv("AZURE_SPEECH_REGION", "westus")

r = sr.Recognizer()

with sr.Microphone() as source:
    audio = r.listen(source)

try:
    text, confidence = r.recognize_azure(
        audio,
        key=AZURE_KEY,
        location=AZURE_REGION
    )
    print(f"Transcript: {text}")
except sr.RequestError as e:
    print(f"Error: {e}")

Setup Instructions

1. Create Azure Account

Sign up for Microsoft Azure
If new, you may get free credits

2. Create Speech Resource

Go to Azure Portal
Click Create a resource
Search for “Speech”
Click Create
Fill in the form:
- Subscription: Select your subscription
- Resource group: Create new or use existing
- Region: Choose a region (e.g., West US, East US)
- Name: Give your resource a name
- Pricing tier: Select a tier (F0 for free tier)
Click Review + Create, then Create

3. Get API Key and Region

Go to your Speech resource
Click Keys and Endpoint in the left menu
Copy Key 1 or Key 2 (both work)
Note the Location/Region (e.g., westus, eastus)

4. Use in Code

import speech_recognition as sr

AZURE_KEY = "your-32-character-key-here"
AZURE_REGION = "westus"  # or your region

r = sr.Recognizer()

with sr.Microphone() as source:
    audio = r.listen(source)

text, confidence = r.recognize_azure(
    audio,
    key=AZURE_KEY,
    location=AZURE_REGION
)
print(text)

Available Regions

Common Azure regions:

Americas: westus, westus2, eastus, eastus2, centralus, brazilsouth
Europe: westeurope, northeurope, uksouth, francecentral
Asia Pacific: southeastasia, eastasia, japaneast, australiaeast, centralindia

Language Support

Supports 100+ languages including:

en-US - English (United States)
en-GB - English (United Kingdom)
es-ES - Spanish (Spain)
fr-FR - French (France)
de-DE - German (Germany)
it-IT - Italian (Italy)
ja-JP - Japanese
zh-CN - Chinese (Simplified)
ko-KR - Korean
pt-BR - Portuguese (Brazil)
ru-RU - Russian
ar-SA - Arabic

See full language list.

Pricing

Free tier (F0): 5 audio hours per month
Standard tier (S0): Pay-per-use pricing

Check Azure Speech pricing for current rates.

Notes

Requires internet connection
Audio is automatically converted to 16 kHz, 16-bit samples
Access tokens are cached for 10 minutes to reduce overhead
Returns both transcript and confidence score
Supports real-time and batch transcription
The location parameter must match your resource’s region

Core Classes

Recognition Methods

Exceptions

Overview

Method Signature

Parameters

Returns

Exceptions

Example Usage

Basic Recognition

With Custom Region

With Different Languages

With Profanity Filtering

Getting Full API Response

From Audio File

Using Environment Variables

Setup Instructions

1. Create Azure Account

2. Create Speech Resource

3. Get API Key and Region

4. Use in Code

Available Regions

Language Support

Pricing

Notes

Core Classes

Recognition Methods

Exceptions

​Overview

​Method Signature

​Parameters

​Returns

​Exceptions

​Example Usage

​Basic Recognition

​With Custom Region

​With Different Languages

​With Profanity Filtering

​Getting Full API Response

​From Audio File

​Using Environment Variables

​Setup Instructions

​1. Create Azure Account

​2. Create Speech Resource

​3. Get API Key and Region

​4. Use in Code

​Available Regions

​Language Support

​Pricing

​Notes

Overview

Method Signature

Parameters

Returns

Exceptions

Example Usage

Basic Recognition

With Custom Region

With Different Languages

With Profanity Filtering

Getting Full API Response

From Audio File

Using Environment Variables

Setup Instructions

1. Create Azure Account

2. Create Speech Resource

3. Get API Key and Region

4. Use in Code

Available Regions

Language Support

Pricing

Notes