Skip to main content

Overview

Reading Practice Mode (Shadowing Coach) helps you improve pronunciation and fluency by reading text aloud. The app listens in real-time via VAD + Whisper, tracks which words you say correctly, and gives you a grade (A+ to F) with detailed feedback.

How to Use

1

Enter Practice Mode

Click the 📖 Practice button in the main window. A new dialog opens.
2

Add text to practice

Paste or type the text you want to read. This can be:
  • A paragraph from a book or article
  • Vocabulary words or phrases
  • Tongue twisters or pronunciation drills
  • Any text in your selected language (English or Spanish)
3

Start reading

Click Start. The app begins listening via your microphone. Read the text aloud at your own pace.
4

See real-time feedback

As you speak, words change color:
  • 🟢 Green: Correctly pronounced
  • 🔴 Red: Mispronounced or skipped
  • Grey: Not yet read
The app uses fuzzy matching (Levenshtein distance) to tolerate minor variations in pronunciation.
5

Click words to hear pronunciation

Click any word to hear its correct pronunciation via TTS. This is useful for:
  • Learning unfamiliar words
  • Comparing your pronunciation to the AI’s
  • Practicing specific sounds
6

Finish and get graded

Click Stop when you’re done. A feedback dialog appears with:
  • Grade (A+, A, B, C, D, or F)
  • Accuracy percentage and visual progress bar
  • Per-word stats: X words spoken, Y correct, Z missed
  • List of missed words with counts

Real-Time Word Tracking

How It Works

1

Text tokenization

The practice text is split into individual words (whitespace-separated, punctuation removed).
2

Continuous listening

VAD monitors your microphone. When speech is detected, Whisper transcribes it.
3

Word matching

Each transcribed word is compared to the practice text using fuzzy matching:
  • Exact match → Green (correct)
  • Levenshtein distance ≤ 2 → Green (close enough)
  • No match → Red (mispronounced or skipped)
4

UI update

The corresponding word in the text turns green or red. Grey words remain unread.

Color-Coding Logic

ColorMeaningCondition
🟢 GreenCorrectWord matches expected text (exact or fuzzy)
🔴 RedIncorrectWord does not match (mispronounced, skipped, or wrong word)
GreyNot readWord has not been spoken yet
Fuzzy matching algorithm:
from difflib import SequenceMatcher

def similar(a, b):
    return SequenceMatcher(None, a, b).ratio() > 0.7
This tolerates:
  • Minor pronunciation variations (“the” vs “thuh”)
  • Pluralization (“cat” vs “cats”)
  • Verb tenses (“run” vs “running”)
Whisper may transcribe phonetically similar words incorrectly (e.g., “there” vs “their”). The fuzzy matcher helps, but isn’t perfect.

Pronunciation Playback

Click to Hear

You can click any word in the practice text to hear its pronunciation via TTS. How it works:
  1. Click a word → App extracts the word text
  2. TTS synthesizes the word using your selected voice
  3. Audio plays via PipeWire (paplay)
Use cases:
  • Learn new words: Hear how to pronounce unfamiliar vocabulary
  • Compare: Say the word, then click it to compare your pronunciation
  • Practice: Repeat-click to drill a specific word
Pronunciation playback uses the current language and voice from Settings. Switch to Spanish to hear Spanish words pronounced correctly.

Grading System

Accuracy Calculation

When you click Stop, the app calculates:
total_words = len(practice_text.split())
correct_words = count_green_words()
accuracy = (correct_words / total_words) * 100

Grade Mapping

GradeAccuracy RangeMeaning
A+95-100%Perfect or near-perfect
A85-94%Excellent
B75-84%Good
C65-74%Adequate
D50-64%Needs improvement
F0-49%Significant practice needed

Feedback Dialog

The feedback dialog shows: 1. Grade and accuracy
🎯 Grade: A
📊 Accuracy: 87.5%
2. Visual progress bar
████████████████░░░░  87.5%
3. Word statistics
📝 Words spoken: 40
✅ Correct: 35
❌ Missed: 5
4. Missed words list
Words you missed:
- pronunciation (2 times)
- especially (1 time)
- algorithm (2 times)
Practice text: “The quick brown fox jumps over the lazy dog. This is a classic pangram used for typing practice.”Your reading: “The quick brown fox jumps over the lazy dog. This is a classic pang… panagram used for typing practice.”Feedback:
  • Grade: A-
  • Accuracy: 93.3% (14/15 words correct)
  • Missed words:
    • pangram (1 time) — you said “pang” and “panagram” (mispronounced)

Advanced Features

VAD Sensitivity

Reading Practice Mode uses Voice Activity Detection to segment your speech. Parameters:
  • silence_threshold = 0.03 (RMS level)
  • silence_duration = 1.5 seconds (pause before stopping)
If you speak too quickly, VAD may cut off mid-sentence. Speak at a moderate pace with natural pauses.

Multi-Language Support

The mode works in English and Spanish (based on your Settings language). How it adapts:
  • STT: Whisper transcribes with language="en" or language="es"
  • TTS: Kokoro uses lang="en-us" or lang="es" for pronunciation playback
  • Fuzzy matching: Works identically in both languages
Make sure your Settings language matches your practice text. Reading English text with Spanish STT (or vice versa) will produce poor results.

Real-Time vs Batch Processing

Reading Practice Mode processes speech in real-time as you speak, not in batch at the end. Advantages:
  • Instant visual feedback
  • Encourages continuous reading flow
  • Helps you notice mistakes immediately
Limitations:
  • Whisper must transcribe each utterance quickly (faster models recommended)
  • Network-dependent models (cloud STT) would add latency (not used here)

Tips for Best Results

Choose appropriate text:
  • Start with short paragraphs (50-100 words)
  • Use clear, simple sentences for beginners
  • Try tongue twisters for advanced pronunciation drills
  • Avoid heavily technical jargon (Whisper may struggle)

Use Cases

Language Learning

Scenario: You’re learning Spanish and want to practice reading comprehension passages. Workflow:
  1. Paste a Spanish paragraph into Reading Practice Mode
  2. Read it aloud, seeing which words you pronounce correctly
  3. Click red words to hear correct pronunciation
  4. Re-read until you achieve 95%+ accuracy

Accent Reduction

Scenario: You want to reduce your accent in English. Workflow:
  1. Use a passage with challenging phonemes (e.g., “th” sounds, “r” vs “l”)
  2. Read aloud and identify red words (mispronounced)
  3. Click each red word to hear native pronunciation
  4. Practice those words separately, then re-read the full passage

Speech Therapy

Scenario: A speech therapist assigns reading exercises. Workflow:
  1. Patient reads assigned text aloud
  2. App tracks which words are difficult (consistently red)
  3. Therapist reviews missed words list
  4. Patient practices specific problem words using click-to-hear

Audition Prep

Scenario: Actor preparing for a role needs to nail specific lines. Workflow:
  1. Paste script lines into Reading Practice Mode
  2. Read lines aloud, ensuring 100% accuracy
  3. Use click-to-hear for unfamiliar words (character names, places)
  4. Practice until achieving A+ grade consistently

Technical Details

Word Matching Algorithm

The app uses Levenshtein distance (edit distance) to determine if a spoken word matches the expected text:
from difflib import SequenceMatcher

def word_similarity(word1, word2):
    # Normalize: lowercase, strip punctuation
    w1 = word1.lower().strip('.,!?;:')
    w2 = word2.lower().strip('.,!?;:')
    
    # Exact match
    if w1 == w2:
        return True
    
    # Fuzzy match (70% similarity threshold)
    return SequenceMatcher(None, w1, w2).ratio() > 0.7
Threshold tuning:
  • 0.7 (70%) allows minor variations
  • Too low (e.g., 0.5) → accepts incorrect words
  • Too high (e.g., 0.9) → rejects valid pronunciations

Missed Words Tracking

The app maintains a dictionary of missed words:
missed_words = {}  # {word: count}

for expected_word in practice_text.split():
    if expected_word not in spoken_words:
        missed_words[expected_word] = missed_words.get(expected_word, 0) + 1
This counts:
  • Skipped words: Words you didn’t say at all
  • Mispronounced words: Words Whisper transcribed incorrectly (below fuzzy threshold)
  • Repeated mistakes: If you mispronounce “pronunciation” twice, it shows pronunciation (2 times)
Whisper’s transcription accuracy depends on your microphone quality, background noise, and accent. Use a better Whisper model (medium/large) for improved results.

Build docs developers (and LLMs) love