Overview
Performs offline speech recognition using CMU Sphinx (PocketSphinx). Works completely offline without requiring an internet connection or API key.Method Signature
Parameters
The audio data to recognize. Must be an
AudioData instance.Recognition language or custom model paths.Option 1: Language string (e.g.,
"en-US", "en-GB")- Out of the box, only
"en-US"is supported - See setup instructions for installing other languages
List of keywords to search for with sensitivity levels.Format:
[(keyword, sensitivity), ...]keyword: Phrase to recognize (str)sensitivity: Float between 0 (insensitive) and 1 (very sensitive)
Path to FSG or JSGF grammar file for constrained recognition.Grammars define the valid phrases that can be recognized, improving accuracy for specific use cases.If a JSGF grammar is provided, an FSG grammar will be automatically generated for faster subsequent runs.
If
True, returns the PocketSphinx Decoder object for advanced usage. If False, returns only the transcription text.Returns
The recognized text when
show_all=FalseThe PocketSphinx Decoder object when
show_all=TrueExceptions
Raised when the speech is unintelligible
Raised when:
- PocketSphinx is not installed
- Language data files are missing
- Model paths are invalid
Example Usage
Basic Offline Recognition
Keyword Spotting
Using Grammar File
From Audio File
With Custom Model Paths
Voice-Activated Assistant
Installation
Install PocketSphinx
System Requirements
- Python: 3.6 or later
- Platform: Linux, macOS, Windows
- Dependencies: PocketSphinx library and language models
Language Support
Out of the Box
Only English (US) is supported by default with thespeech_recognition library.
Installing Additional Languages
To use other languages:- Download language models from CMU Sphinx models
-
Extract files to get:
- Acoustic model directory (e.g.,
en-usores-es) - Language model file (
.lmor.lm.bin) - Pronunciation dictionary (
.dict)
- Acoustic model directory (e.g.,
-
Use custom paths:
Available Language Models
- English (US, UK, Indian)
- Spanish
- French
- German
- Russian
- Chinese (Mandarin)
- And more…
Keyword Sensitivity Guidelines
When usingkeyword_entries, the sensitivity parameter affects recognition:
- 0.0 - 0.3: Very insensitive (few false positives, more false negatives)
- 0.4 - 0.6: Balanced (recommended for most use cases)
- 0.7 - 0.9: Sensitive (catches more, may have false positives)
- 0.9 - 1.0: Very sensitive (many false positives)
Grammar Files
JSGF Format
Create a.jsgf file for grammar-based recognition:
Advantages
- Works completely offline (no internet required)
- Free - no API keys or costs
- Privacy - audio never leaves your device
- Good for keyword spotting and voice commands
- Lightweight and fast
Limitations
- Lower accuracy compared to cloud-based services
- Limited language support out of the box
- Requires language model files
- Best for constrained vocabulary (keywords, commands)
- May struggle with continuous speech or noisy environments
Best Use Cases
- Voice-activated devices (wake word detection)
- Offline applications (no internet available)
- Privacy-sensitive applications (data must stay local)
- Command recognition (limited vocabulary)
- Embedded systems (Raspberry Pi, IoT devices)
Notes
- Completely offline - no internet required
- Audio is automatically converted to 16 kHz, 16-bit mono
- Keyword spotting is more accurate than general transcription
- Grammar-based recognition improves accuracy for specific use cases
- Lower accuracy than cloud services, but free and private