Recognizer

The Recognizer class is the core component of the SpeechRecognition library. It handles audio capture, speech detection, and provides methods to interface with various speech recognition engines.

Creating a Recognizer

import speech_recognition as sr

r = sr.Recognizer()

A Recognizer instance can be reused for multiple recognition operations and with different audio sources.

Key Properties

The Recognizer class provides several configurable properties that control how audio is captured and when speech is detected.

Energy Threshold

energy_threshold

float

default:"300"

The minimum audio energy level to consider for recording. Values below this threshold are considered silence. Higher values make the recognizer less sensitive to quiet sounds.

import speech_recognition as sr

r = sr.Recognizer()

# Make the recognizer less sensitive (useful in noisy environments)
r.energy_threshold = 4000

# Make it more sensitive (useful in quiet environments)
r.energy_threshold = 100

Dynamic Energy Threshold

dynamic_energy_threshold

bool

default:"True"

When enabled, the energy threshold is automatically adjusted based on ambient noise levels. This helps the recognizer adapt to different acoustic environments.

dynamic_energy_adjustment_damping

float

default:"0.15"

Controls how quickly the dynamic energy threshold adjusts. Lower values make it adapt more slowly, higher values make it adapt more quickly.

dynamic_energy_ratio

float

default:"1.5"

The ratio of speech energy to ambient noise energy. The energy threshold is set to ambient noise energy multiplied by this value.

import speech_recognition as sr

r = sr.Recognizer()

# Disable dynamic adjustment for consistent behavior
r.dynamic_energy_threshold = False

# Enable it and configure the adjustment speed
r.dynamic_energy_threshold = True
r.dynamic_energy_adjustment_damping = 0.1  # Slower adjustment

Pause Threshold

pause_threshold

float

default:"0.8"

The duration of silence (in seconds) after speech that indicates the end of a phrase. Increasing this value makes the recognizer wait longer before concluding that speech has ended.

import speech_recognition as sr

r = sr.Recognizer()

# Wait longer for pauses (good for slow speakers)
r.pause_threshold = 1.5

# Respond quickly to pauses
r.pause_threshold = 0.5

Phrase Threshold

phrase_threshold

float

default:"0.3"

The minimum duration of speech (in seconds) to be considered a valid phrase. Shorter sounds are ignored. This helps filter out clicks, pops, and brief noises.

import speech_recognition as sr

r = sr.Recognizer()

# Ignore very short sounds
r.phrase_threshold = 0.5

Non-Speaking Duration

non_speaking_duration

float

default:"0.5"

The amount of silence (in seconds) to keep on both sides of the recorded phrase. This helps ensure that the beginning and end of speech aren’t cut off.

Operation Timeout

operation_timeout

float | None

default:"None"

The timeout in seconds for API requests to speech recognition services. If None, there is no timeout.

import speech_recognition as sr

r = sr.Recognizer()

# Set a 10-second timeout for API calls
r.operation_timeout = 10

Recording Audio

The record() method captures audio from a source and returns an AudioData object.

Syntax

record(source, duration=None, offset=None)

source

AudioSource

required

An active audio source (Microphone or AudioFile) within a context manager.

duration

float | None

default:"None"

The maximum duration in seconds to record. If None, records until the end of the stream.

offset

float | None

default:"None"

The number of seconds to skip before starting to record.

Examples

import speech_recognition as sr

r = sr.Recognizer()

# Record for exactly 5 seconds
with sr.Microphone() as source:
    print("Recording for 5 seconds...")
    audio = r.record(source, duration=5)

# Record an entire audio file
with sr.AudioFile("speech.wav") as source:
    audio = r.record(source)

# Record 10 seconds starting at 2 seconds into the file
with sr.AudioFile("speech.wav") as source:
    audio = r.record(source, offset=2, duration=10)

Listening for Speech

The listen() method intelligently detects when speech starts and stops, automatically capturing complete phrases.

Syntax

listen(source, timeout=None, phrase_time_limit=None, snowboy_configuration=None, stream=False)

source

AudioSource

required

An active audio source within a context manager.

timeout

float | None

default:"None"

The maximum time in seconds to wait for a phrase to start. Raises WaitTimeoutError if exceeded.

phrase_time_limit

float | None

default:"None"

The maximum duration in seconds for a phrase. If exceeded, returns the audio captured up to that point.

snowboy_configuration

tuple | None

default:"None"

Configuration for Snowboy hotword detection. Should be a tuple of (SNOWBOY_LOCATION, LIST_OF_HOT_WORD_FILES).

stream

bool

default:"False"

If True, yields AudioData chunks as they are detected rather than returning a single complete phrase.

Basic Usage

import speech_recognition as sr

r = sr.Recognizer()

with sr.Microphone() as source:
    print("Say something!")
    audio = r.listen(source)
    
print("Processing...")

With Timeout

import speech_recognition as sr

r = sr.Recognizer()

with sr.Microphone() as source:
    print("You have 5 seconds to start speaking...")
    try:
        audio = r.listen(source, timeout=5)
    except sr.WaitTimeoutError:
        print("No speech detected within timeout")

With Phrase Time Limit

import speech_recognition as sr

r = sr.Recognizer()

with sr.Microphone() as source:
    print("Speak for up to 10 seconds...")
    # Automatically stops after 10 seconds of speech
    audio = r.listen(source, phrase_time_limit=10)

Streaming Mode

Streaming mode yields audio chunks as they’re captured, useful for real-time processing:

import speech_recognition as sr

r = sr.Recognizer()

with sr.Microphone() as source:
    print("Start speaking...")
    for chunk in r.listen(source, stream=True):
        # Process each chunk as it arrives
        print(f"Received {len(chunk.frame_data)} bytes")

Adjusting for Ambient Noise

The adjust_for_ambient_noise() method calibrates the energy threshold to account for background noise.

Syntax

adjust_for_ambient_noise(source, duration=1)

source

AudioSource

required

An active audio source within a context manager.

duration

float

default:"1"

The duration in seconds to analyze ambient noise. Should be at least 0.5 seconds for accurate calibration.

Usage

import speech_recognition as sr

r = sr.Recognizer()

with sr.Microphone() as source:
    print("Calibrating for ambient noise... Please wait.")
    r.adjust_for_ambient_noise(source, duration=1)
    print("Calibration complete. You can start speaking.")
    
    audio = r.listen(source)

Call adjust_for_ambient_noise() in a quiet period before speech. If speech is detected during calibration, the method stops early to avoid miscalibration.

Extended Calibration

import speech_recognition as sr

r = sr.Recognizer()

with sr.Microphone() as source:
    # Longer calibration for very noisy environments
    print("Calibrating for 3 seconds...")
    r.adjust_for_ambient_noise(source, duration=3)
    
    audio = r.listen(source)

Background Listening

The listen_in_background() method spawns a background thread that continuously listens for speech and calls a callback function with each detected phrase.

Syntax

listen_in_background(source, callback, phrase_time_limit=None)

source

AudioSource

required

An audio source (typically a Microphone) to listen to.

callback

function

required

A function that accepts two parameters: the recognizer instance and an AudioData instance. Called from a background thread whenever speech is detected.

phrase_time_limit

float | None

default:"None"

The maximum duration for a phrase, same as in listen().

Returns: A function that, when called, stops the background listener.

Example

import speech_recognition as sr
import time

def callback(recognizer, audio):
    # This runs in a background thread
    try:
        text = recognizer.recognize_google(audio)
        print(f"You said: {text}")
    except sr.UnknownValueError:
        print("Could not understand audio")
    except sr.RequestError as e:
        print(f"Error: {e}")

r = sr.Recognizer()
m = sr.Microphone()

# Calibrate once before starting
with m as source:
    r.adjust_for_ambient_noise(source)

# Start listening in the background
stop_listening = r.listen_in_background(m, callback)

# Do other work while listening continues
print("Listening in background... Press Ctrl+C to stop")
try:
    while True:
        time.sleep(0.1)
except KeyboardInterrupt:
    pass

# Stop the background listener
stop_listening(wait_for_stop=True)
print("Stopped listening")

Stopping the Background Listener

The function returned by listen_in_background() accepts one parameter:

wait_for_stop

bool

default:"True"

If True, blocks until the background thread has stopped. If False, returns immediately (the thread may still run briefly while cleaning up).

# Stop and wait for cleanup
stop_listening(wait_for_stop=True)

# Stop without waiting
stop_listening(wait_for_stop=False)

The background listener thread is a daemon thread and will not prevent the program from exiting. Make sure your main thread stays alive while you want to keep listening.

Complete Example

Here’s a comprehensive example demonstrating multiple Recognizer features:

import speech_recognition as sr

# Create a recognizer and configure it
r = sr.Recognizer()
r.pause_threshold = 1.0  # Wait longer for pauses
r.phrase_threshold = 0.3  # Filter out short noises
r.non_speaking_duration = 0.5

with sr.Microphone() as source:
    # Calibrate for ambient noise
    print("Adjusting for ambient noise... Please be quiet.")
    r.adjust_for_ambient_noise(source, duration=2)
    print(f"Energy threshold set to {r.energy_threshold}")
    
    # Listen for speech with timeout
    print("Listening... (5 second timeout)")
    try:
        audio = r.listen(source, timeout=5, phrase_time_limit=10)
        print("Processing your speech...")
        
        # Recognize the speech
        text = r.recognize_google(audio)
        print(f"You said: {text}")
        
    except sr.WaitTimeoutError:
        print("Listening timed out")
    except sr.UnknownValueError:
        print("Could not understand the audio")
    except sr.RequestError as e:
        print(f"API error: {e}")

Getting Started

Core Concepts

Recognition Engines

Guides

Examples

Creating a Recognizer

Key Properties

Energy Threshold

Dynamic Energy Threshold

Pause Threshold

Phrase Threshold

Non-Speaking Duration

Operation Timeout

Recording Audio

Syntax

Examples

Listening for Speech

Syntax

Basic Usage

With Timeout

With Phrase Time Limit

Streaming Mode

Adjusting for Ambient Noise

Syntax

Usage

Extended Calibration

Background Listening

Syntax

Example

Stopping the Background Listener

Complete Example

Getting Started

Core Concepts

Recognition Engines

Guides

Examples

​Creating a Recognizer

​Key Properties

​Energy Threshold

​Dynamic Energy Threshold

​Pause Threshold

​Phrase Threshold

​Non-Speaking Duration

​Operation Timeout

​Recording Audio

​Syntax

​Examples

​Listening for Speech

​Syntax

​Basic Usage

​With Timeout

​With Phrase Time Limit

​Streaming Mode

​Adjusting for Ambient Noise

​Syntax

​Usage

​Extended Calibration

​Background Listening

​Syntax

​Example

​Stopping the Background Listener

​Complete Example

Creating a Recognizer

Key Properties

Energy Threshold

Dynamic Energy Threshold

Pause Threshold

Phrase Threshold

Non-Speaking Duration

Operation Timeout

Recording Audio

Syntax

Examples

Listening for Speech

Syntax

Basic Usage

With Timeout

With Phrase Time Limit

Streaming Mode

Adjusting for Ambient Noise

Syntax

Usage

Extended Calibration

Background Listening

Syntax

Example

Stopping the Background Listener

Complete Example