Overview
Reading Practice Mode (Shadowing Coach) helps you improve pronunciation and fluency by reading text aloud. The app listens in real-time via VAD + Whisper, tracks which words you say correctly, and gives you a grade (A+ to F) with detailed feedback.How to Use
Add text to practice
Paste or type the text you want to read. This can be:
- A paragraph from a book or article
- Vocabulary words or phrases
- Tongue twisters or pronunciation drills
- Any text in your selected language (English or Spanish)
Start reading
Click Start. The app begins listening via your microphone. Read the text aloud at your own pace.
See real-time feedback
As you speak, words change color:
- 🟢 Green: Correctly pronounced
- 🔴 Red: Mispronounced or skipped
- ⚪ Grey: Not yet read
The app uses fuzzy matching (Levenshtein distance) to tolerate minor variations in pronunciation.
Click words to hear pronunciation
Click any word to hear its correct pronunciation via TTS. This is useful for:
- Learning unfamiliar words
- Comparing your pronunciation to the AI’s
- Practicing specific sounds
Real-Time Word Tracking
How It Works
Text tokenization
The practice text is split into individual words (whitespace-separated, punctuation removed).
Word matching
Each transcribed word is compared to the practice text using fuzzy matching:
- Exact match → Green (correct)
- Levenshtein distance ≤ 2 → Green (close enough)
- No match → Red (mispronounced or skipped)
Color-Coding Logic
| Color | Meaning | Condition |
|---|---|---|
| 🟢 Green | Correct | Word matches expected text (exact or fuzzy) |
| 🔴 Red | Incorrect | Word does not match (mispronounced, skipped, or wrong word) |
| ⚪ Grey | Not read | Word has not been spoken yet |
- Minor pronunciation variations (“the” vs “thuh”)
- Pluralization (“cat” vs “cats”)
- Verb tenses (“run” vs “running”)
Whisper may transcribe phonetically similar words incorrectly (e.g., “there” vs “their”). The fuzzy matcher helps, but isn’t perfect.
Pronunciation Playback
Click to Hear
You can click any word in the practice text to hear its pronunciation via TTS. How it works:- Click a word → App extracts the word text
- TTS synthesizes the word using your selected voice
- Audio plays via PipeWire (
paplay)
- Learn new words: Hear how to pronounce unfamiliar vocabulary
- Compare: Say the word, then click it to compare your pronunciation
- Practice: Repeat-click to drill a specific word
Pronunciation playback uses the current language and voice from Settings. Switch to Spanish to hear Spanish words pronounced correctly.
Grading System
Accuracy Calculation
When you click Stop, the app calculates:Grade Mapping
| Grade | Accuracy Range | Meaning |
|---|---|---|
| A+ | 95-100% | Perfect or near-perfect |
| A | 85-94% | Excellent |
| B | 75-84% | Good |
| C | 65-74% | Adequate |
| D | 50-64% | Needs improvement |
| F | 0-49% | Significant practice needed |
Feedback Dialog
The feedback dialog shows: 1. Grade and accuracyExample Feedback
Example Feedback
Practice text: “The quick brown fox jumps over the lazy dog. This is a classic pangram used for typing practice.”Your reading: “The quick brown fox jumps over the lazy dog. This is a classic pang… panagram used for typing practice.”Feedback:
- Grade: A-
- Accuracy: 93.3% (14/15 words correct)
- Missed words:
- pangram (1 time) — you said “pang” and “panagram” (mispronounced)
Advanced Features
VAD Sensitivity
Reading Practice Mode uses Voice Activity Detection to segment your speech. Parameters:silence_threshold = 0.03(RMS level)silence_duration = 1.5 seconds(pause before stopping)
Multi-Language Support
The mode works in English and Spanish (based on your Settings language). How it adapts:- STT: Whisper transcribes with
language="en"orlanguage="es" - TTS: Kokoro uses
lang="en-us"orlang="es"for pronunciation playback - Fuzzy matching: Works identically in both languages
Real-Time vs Batch Processing
Reading Practice Mode processes speech in real-time as you speak, not in batch at the end. Advantages:- Instant visual feedback
- Encourages continuous reading flow
- Helps you notice mistakes immediately
- Whisper must transcribe each utterance quickly (faster models recommended)
- Network-dependent models (cloud STT) would add latency (not used here)
Tips for Best Results
- Text Selection
- Reading Technique
- Microphone Setup
- Troubleshooting
Choose appropriate text:
- Start with short paragraphs (50-100 words)
- Use clear, simple sentences for beginners
- Try tongue twisters for advanced pronunciation drills
- Avoid heavily technical jargon (Whisper may struggle)
Use Cases
Language Learning
Scenario: You’re learning Spanish and want to practice reading comprehension passages. Workflow:- Paste a Spanish paragraph into Reading Practice Mode
- Read it aloud, seeing which words you pronounce correctly
- Click red words to hear correct pronunciation
- Re-read until you achieve 95%+ accuracy
Accent Reduction
Scenario: You want to reduce your accent in English. Workflow:- Use a passage with challenging phonemes (e.g., “th” sounds, “r” vs “l”)
- Read aloud and identify red words (mispronounced)
- Click each red word to hear native pronunciation
- Practice those words separately, then re-read the full passage
Speech Therapy
Scenario: A speech therapist assigns reading exercises. Workflow:- Patient reads assigned text aloud
- App tracks which words are difficult (consistently red)
- Therapist reviews missed words list
- Patient practices specific problem words using click-to-hear
Audition Prep
Scenario: Actor preparing for a role needs to nail specific lines. Workflow:- Paste script lines into Reading Practice Mode
- Read lines aloud, ensuring 100% accuracy
- Use click-to-hear for unfamiliar words (character names, places)
- Practice until achieving A+ grade consistently
Technical Details
Word Matching Algorithm
The app uses Levenshtein distance (edit distance) to determine if a spoken word matches the expected text:0.7(70%) allows minor variations- Too low (e.g., 0.5) → accepts incorrect words
- Too high (e.g., 0.9) → rejects valid pronunciations
Missed Words Tracking
The app maintains a dictionary of missed words:- Skipped words: Words you didn’t say at all
- Mispronounced words: Words Whisper transcribed incorrectly (below fuzzy threshold)
- Repeated mistakes: If you mispronounce “pronunciation” twice, it shows
pronunciation (2 times)
Whisper’s transcription accuracy depends on your microphone quality, background noise, and accent. Use a better Whisper model (medium/large) for improved results.