Skip to main content
This quickstart guide will help you build your first speech recognition application using the SpeechRecognition library.

Prerequisites

Before you begin, make sure you have:
  • Python 3.9 or later installed
  • SpeechRecognition library installed (pip install SpeechRecognition)
  • PyAudio installed for microphone support (pip install SpeechRecognition[audio])
If you haven’t installed the library yet, see the Installation Guide.

Your First Recognition

Let’s start with the simplest possible example - recognizing speech from your microphone using Google’s free speech recognition service.
1

Create a new Python file

Create a file called speech_demo.py and import the library:
speech_demo.py
import speech_recognition as sr

# Create a recognizer instance
r = sr.Recognizer()
2

Capture audio from microphone

Use the Microphone context manager to capture audio:
speech_demo.py
# Use the default microphone as audio source
with sr.Microphone() as source:
    print("Say something!")
    audio = r.listen(source)
3

Recognize the speech

Send the audio to Google Speech Recognition:
speech_demo.py
# Recognize speech using Google Speech Recognition
try:
    text = r.recognize_google(audio)
    print(f"You said: {text}")
except sr.UnknownValueError:
    print("Could not understand audio")
except sr.RequestError as e:
    print(f"Error: {e}")
4

Run your program

Execute your script:
python speech_demo.py
Speak into your microphone when prompted, and you should see your speech transcribed!

Complete Example

Here’s the complete working code:
speech_demo.py
import speech_recognition as sr

# Create recognizer instance
r = sr.Recognizer()

# Use microphone as audio source
with sr.Microphone() as source:
    print("Say something!")
    audio = r.listen(source)

# Recognize speech using Google Speech Recognition
try:
    text = r.recognize_google(audio)
    print(f"You said: {text}")
except sr.UnknownValueError:
    print("Could not understand audio")
except sr.RequestError as e:
    print(f"Error: {e}")

Handling Ambient Noise

For better accuracy, calibrate the recognizer to ambient noise levels before listening:
import speech_recognition as sr

r = sr.Recognizer()

with sr.Microphone() as source:
    # Adjust for ambient noise - listens for 1 second
    print("Adjusting for ambient noise... Please wait")
    r.adjust_for_ambient_noise(source, duration=1)
    
    print("Say something!")
    audio = r.listen(source)

try:
    text = r.recognize_google(audio)
    print(f"You said: {text}")
except sr.UnknownValueError:
    print("Could not understand audio")
except sr.RequestError as e:
    print(f"Error: {e}")
Always call adjust_for_ambient_noise() in a quiet environment before the user starts speaking. This sets the energy threshold appropriately for the current noise level.

Recognizing Audio Files

You can also recognize speech from audio files (WAV, AIFF, or FLAC):
import speech_recognition as sr

r = sr.Recognizer()

# Load audio file
with sr.AudioFile('audio.wav') as source:
    audio = r.record(source)  # Read the entire file

# Recognize speech
try:
    text = r.recognize_google(audio)
    print(f"Transcription: {text}")
except sr.UnknownValueError:
    print("Could not understand audio")
except sr.RequestError as e:
    print(f"Error: {e}")

Using Different Recognition Engines

The library supports multiple recognition engines. Here are examples using different services:
# No API key required!
text = r.recognize_google(audio)
Google Speech Recognition is free and doesn’t require an API key, making it perfect for getting started. For production applications, consider using other engines with proper API keys.

Specifying Languages

Most recognition engines support multiple languages. Specify the language using BCP-47 language tags:
# English (US)
text = r.recognize_google(audio, language="en-US")

# French (France)
text = r.recognize_google(audio, language="fr-FR")

# Spanish (Spain)
text = r.recognize_google(audio, language="es-ES")

# German (Germany)
text = r.recognize_google(audio, language="de-DE")

# Japanese
text = r.recognize_google(audio, language="ja-JP")

Background Listening

For continuous speech recognition, use listen_in_background():
import speech_recognition as sr

r = sr.Recognizer()
m = sr.Microphone()

def callback(recognizer, audio):
    """This is called from a background thread"""
    try:
        text = recognizer.recognize_google(audio)
        print(f"You said: {text}")
    except sr.UnknownValueError:
        print("Could not understand audio")
    except sr.RequestError as e:
        print(f"Error: {e}")

# Start listening in the background
stop_listening = r.listen_in_background(m, callback)

# Keep the program running
print("Listening... Press Ctrl+C to stop")
try:
    while True:
        pass
except KeyboardInterrupt:
    stop_listening(wait_for_stop=False)
    print("Stopped listening")
The callback function runs in a background thread. Make sure your callback is thread-safe if it modifies shared data.

Error Handling

Always handle both exception types when performing recognition:
import speech_recognition as sr

r = sr.Recognizer()

with sr.Microphone() as source:
    audio = r.listen(source)

try:
    text = r.recognize_google(audio)
    print(f"Result: {text}")
    
except sr.UnknownValueError:
    # Speech was unintelligible
    print("Sorry, I couldn't understand that")
    
except sr.RequestError as e:
    # API request failed
    print(f"Could not connect to the service: {e}")
  • UnknownValueError: Raised when the recognizer can’t understand the speech
  • RequestError: Raised when there’s a network error or API issue

Next Steps

Now that you’ve built your first speech recognition application, explore more advanced features:

API Reference

Complete API documentation with all available methods

Recognition Engines

Learn about different recognition engines and when to use them

Microphone Input

Working with microphones and real-time audio input

Background Listening

Continuous speech recognition in the background

Examples

More code examples in the GitHub repository

Core Concepts

Deep dive into audio sources and recognizers