Reading Practice Mode

Overview

Reading Practice Mode (Shadowing Coach) helps you improve pronunciation and fluency by reading text aloud. The app listens in real-time via VAD + Whisper, tracks which words you say correctly, and gives you a grade (A+ to F) with detailed feedback.

How to Use

Enter Practice Mode

Click the 📖 Practice button in the main window. A new dialog opens.

Add text to practice

Paste or type the text you want to read. This can be:

A paragraph from a book or article
Vocabulary words or phrases
Tongue twisters or pronunciation drills
Any text in your selected language (English or Spanish)

Start reading

Click Start. The app begins listening via your microphone. Read the text aloud at your own pace.

See real-time feedback

As you speak, words change color:

🟢 Green: Correctly pronounced
🔴 Red: Mispronounced or skipped
⚪ Grey: Not yet read

The app uses fuzzy matching (Levenshtein distance) to tolerate minor variations in pronunciation.

Click words to hear pronunciation

Click any word to hear its correct pronunciation via TTS. This is useful for:

Learning unfamiliar words
Comparing your pronunciation to the AI’s
Practicing specific sounds

Finish and get graded

Click Stop when you’re done. A feedback dialog appears with:

Grade (A+, A, B, C, D, or F)
Accuracy percentage and visual progress bar
Per-word stats: X words spoken, Y correct, Z missed
List of missed words with counts

Real-Time Word Tracking

How It Works

Text tokenization

The practice text is split into individual words (whitespace-separated, punctuation removed).

Continuous listening

VAD monitors your microphone. When speech is detected, Whisper transcribes it.

Word matching

Each transcribed word is compared to the practice text using fuzzy matching:

Exact match → Green (correct)
Levenshtein distance ≤ 2 → Green (close enough)
No match → Red (mispronounced or skipped)

UI update

The corresponding word in the text turns green or red. Grey words remain unread.

Color-Coding Logic

Color	Meaning	Condition
🟢 Green	Correct	Word matches expected text (exact or fuzzy)
🔴 Red	Incorrect	Word does not match (mispronounced, skipped, or wrong word)
⚪ Grey	Not read	Word has not been spoken yet

Fuzzy matching algorithm:

from difflib import SequenceMatcher

def similar(a, b):
    return SequenceMatcher(None, a, b).ratio() > 0.7

This tolerates:

Minor pronunciation variations (“the” vs “thuh”)
Pluralization (“cat” vs “cats”)
Verb tenses (“run” vs “running”)

Whisper may transcribe phonetically similar words incorrectly (e.g., “there” vs “their”). The fuzzy matcher helps, but isn’t perfect.

Pronunciation Playback

Click to Hear

You can click any word in the practice text to hear its pronunciation via TTS. How it works:

Click a word → App extracts the word text
TTS synthesizes the word using your selected voice
Audio plays via PipeWire (paplay)

Use cases:

Learn new words: Hear how to pronounce unfamiliar vocabulary
Compare: Say the word, then click it to compare your pronunciation
Practice: Repeat-click to drill a specific word

Pronunciation playback uses the current language and voice from Settings. Switch to Spanish to hear Spanish words pronounced correctly.

Grading System

Accuracy Calculation

When you click Stop, the app calculates:

total_words = len(practice_text.split())
correct_words = count_green_words()
accuracy = (correct_words / total_words) * 100

Grade Mapping

Grade	Accuracy Range	Meaning
A+	95-100%	Perfect or near-perfect
A	85-94%	Excellent
B	75-84%	Good
C	65-74%	Adequate
D	50-64%	Needs improvement
F	0-49%	Significant practice needed

Feedback Dialog

The feedback dialog shows: 1. Grade and accuracy

🎯 Grade: A
📊 Accuracy: 87.5%

2. Visual progress bar

████████████████░░░░  87.5%

3. Word statistics

📝 Words spoken: 40
✅ Correct: 35
❌ Missed: 5

4. Missed words list

Words you missed:
- pronunciation (2 times)
- especially (1 time)
- algorithm (2 times)

Example Feedback

Practice text: “The quick brown fox jumps over the lazy dog. This is a classic pangram used for typing practice.”Your reading: “The quick brown fox jumps over the lazy dog. This is a classic pang… panagram used for typing practice.”Feedback:

Grade: A-
Accuracy: 93.3% (14/15 words correct)
Missed words:
- pangram (1 time) — you said “pang” and “panagram” (mispronounced)

Advanced Features

VAD Sensitivity

Reading Practice Mode uses Voice Activity Detection to segment your speech. Parameters:

silence_threshold = 0.03 (RMS level)
silence_duration = 1.5 seconds (pause before stopping)

If you speak too quickly, VAD may cut off mid-sentence. Speak at a moderate pace with natural pauses.

Multi-Language Support

The mode works in English and Spanish (based on your Settings language). How it adapts:

STT: Whisper transcribes with language="en" or language="es"
TTS: Kokoro uses lang="en-us" or lang="es" for pronunciation playback
Fuzzy matching: Works identically in both languages

Make sure your Settings language matches your practice text. Reading English text with Spanish STT (or vice versa) will produce poor results.

Real-Time vs Batch Processing

Reading Practice Mode processes speech in real-time as you speak, not in batch at the end. Advantages:

Instant visual feedback
Encourages continuous reading flow
Helps you notice mistakes immediately

Limitations:

Whisper must transcribe each utterance quickly (faster models recommended)
Network-dependent models (cloud STT) would add latency (not used here)

Tips for Best Results

Text Selection
Reading Technique
Microphone Setup
Troubleshooting

Choose appropriate text:

Start with short paragraphs (50-100 words)
Use clear, simple sentences for beginners
Try tongue twisters for advanced pronunciation drills
Avoid heavily technical jargon (Whisper may struggle)

Use Cases

Language Learning

Scenario: You’re learning Spanish and want to practice reading comprehension passages. Workflow:

Paste a Spanish paragraph into Reading Practice Mode
Read it aloud, seeing which words you pronounce correctly
Click red words to hear correct pronunciation
Re-read until you achieve 95%+ accuracy

Accent Reduction

Scenario: You want to reduce your accent in English. Workflow:

Use a passage with challenging phonemes (e.g., “th” sounds, “r” vs “l”)
Read aloud and identify red words (mispronounced)
Click each red word to hear native pronunciation
Practice those words separately, then re-read the full passage

Speech Therapy

Scenario: A speech therapist assigns reading exercises. Workflow:

Patient reads assigned text aloud
App tracks which words are difficult (consistently red)
Therapist reviews missed words list
Patient practices specific problem words using click-to-hear

Audition Prep

Scenario: Actor preparing for a role needs to nail specific lines. Workflow:

Paste script lines into Reading Practice Mode
Read lines aloud, ensuring 100% accuracy
Use click-to-hear for unfamiliar words (character names, places)
Practice until achieving A+ grade consistently

Technical Details

Word Matching Algorithm

The app uses Levenshtein distance (edit distance) to determine if a spoken word matches the expected text:

from difflib import SequenceMatcher

def word_similarity(word1, word2):
    # Normalize: lowercase, strip punctuation
    w1 = word1.lower().strip('.,!?;:')
    w2 = word2.lower().strip('.,!?;:')
    
    # Exact match
    if w1 == w2:
        return True
    
    # Fuzzy match (70% similarity threshold)
    return SequenceMatcher(None, w1, w2).ratio() > 0.7

Threshold tuning:

0.7 (70%) allows minor variations
Too low (e.g., 0.5) → accepts incorrect words
Too high (e.g., 0.9) → rejects valid pronunciations

Missed Words Tracking

The app maintains a dictionary of missed words:

missed_words = {}  # {word: count}

for expected_word in practice_text.split():
    if expected_word not in spoken_words:
        missed_words[expected_word] = missed_words.get(expected_word, 0) + 1

This counts:

Skipped words: Words you didn’t say at all
Mispronounced words: Words Whisper transcribed incorrectly (below fuzzy threshold)
Repeated mistakes: If you mispronounce “pronunciation” twice, it shows pronunciation (2 times)

Whisper’s transcription accuracy depends on your microphone quality, background noise, and accent. Use a better Whisper model (medium/large) for improved results.

Get Started

Core Features

Configuration

Advanced

Overview

How to Use

Real-Time Word Tracking

How It Works

Color-Coding Logic

Pronunciation Playback

Click to Hear

Grading System

Accuracy Calculation

Grade Mapping

Feedback Dialog

Advanced Features

VAD Sensitivity

Multi-Language Support

Real-Time vs Batch Processing

Tips for Best Results

Use Cases

Language Learning

Accent Reduction

Speech Therapy

Audition Prep

Technical Details

Word Matching Algorithm

Missed Words Tracking

Build docs developers (and LLMs) love

Get Started

Core Features

Configuration

Advanced

​Overview

​How to Use

​Real-Time Word Tracking

​How It Works

​Color-Coding Logic

​Pronunciation Playback

​Click to Hear

​Grading System

​Accuracy Calculation

​Grade Mapping

​Feedback Dialog

​Advanced Features

​VAD Sensitivity

​Multi-Language Support

​Real-Time vs Batch Processing

​Tips for Best Results

​Use Cases

​Language Learning

​Accent Reduction

​Speech Therapy

​Audition Prep

​Technical Details

​Word Matching Algorithm

​Missed Words Tracking

Build docs developers (and LLMs) love

Overview

How to Use

Real-Time Word Tracking

How It Works

Color-Coding Logic

Pronunciation Playback

Click to Hear

Grading System

Accuracy Calculation

Grade Mapping

Feedback Dialog

Advanced Features

VAD Sensitivity

Multi-Language Support

Real-Time vs Batch Processing

Tips for Best Results

Use Cases

Language Learning

Accent Reduction

Speech Therapy

Audition Prep

Technical Details

Word Matching Algorithm

Missed Words Tracking