Quick start

Launch the application

With your virtual environment activated, start ChatbotAI-Free:

cd ChatbotAI-Free
source venv/bin/activate  # On Windows: venv\Scripts\activate
python main.py

On first launch, Whisper downloads the base model (~140 MB). This happens automatically and only needs to be done once.

The voice scanner will check your voices/ folder and detect available voices. If Kokoro files are present, you’ll see 54 English and Spanish voices available immediately.

Your first conversation

Select your model and voice

At the top of the window, you’ll see two dropdowns:

Model selector (left): Choose your Ollama LLM (e.g., llama3.1:8b)
Voice selector (right): Choose a voice (e.g., af_bella for English female)

The app automatically detects all available Ollama models. If you don’t see your model, make sure you’ve pulled it with ollama pull <model-name>.

Start with text or voice

You can interact in two ways:Text input:

Type your message in the text box at the bottom
Press Enter to send (or Shift+Enter for a new line)

Voice input:

Click the 🎤 microphone button to start recording
Speak your message
Click the button again to stop and send

You can enable auto-send in Settings to automatically send when you stop speaking (using Voice Activity Detection).

The app will:

Transcribe your speech (if using voice)
Show your message in a user bubble (right-aligned)
Stream the AI’s response in real-time
Generate and play TTS audio simultaneously

Understand the UI elements

Chat area:

Your messages appear in gray bubbles on the right
AI responses appear on the left with a ✨ avatar and markdown rendering
Code blocks, tables, and formatting are rendered automatically

Status indicator (bottom-left):

“Ready” - Waiting for input
“Transcribing…” - Converting speech to text
“Thinking…” - LLM is generating response
“Speaking…” - TTS is playing audio

Context donut (bottom-right):

Shows conversation token usage as a colored ring
Green < 50%, Yellow < 80%, Red > 80%
Click to see detailed context window stats

When the context window fills up (100%), older messages are dropped. Start a new chat to preserve full context.

Try interrupting the AI

While the AI is speaking, click the ⏹ Stop button to interrupt mid-response. This is useful when:

The AI goes off-topic
You want to ask a follow-up immediately
The response is taking too long

In Live Mode (hands-free), you can naturally interrupt the AI by speaking—just like a real conversation. The app detects your voice and stops playback immediately.

Save and manage conversations

Conversations are auto-saved as Markdown files in the chat_history/ folder.Sidebar navigation:

Click the ☰ hamburger button to toggle the chat history sidebar
Click ➕ New Chat to start a fresh conversation
Click any saved chat to resume it
Right-click a chat to rename or delete it

Chat titles:

Generated automatically by the lightest available Ollama model
Based on the first user message
Can be renamed manually by right-clicking

Each chat maintains its own conversation history and context. Switching chats loads the full history and updates the context donut.

Explore conversation modes

Classic chat mode (default)

Turn-by-turn conversations with full control:

Type or speak your message
See markdown-rendered responses in real-time
Interrupt with the Stop button
Perfect for detailed discussions, coding help, or document analysis

Live mode (hands-free)

Click the ✨ Live button for continuous conversation:

Completely hands-free interaction
Voice Activity Detection (VAD) automatically detects when you stop speaking
Natural barge-in: interrupt the AI mid-sentence by speaking
The app mutes itself when you click the Live button again

Barge-in detection monitors your voice in real-time. When you start speaking while the AI is talking, playback stops immediately and the app starts listening to you.

Live Mode requires a clear audio environment. Background noise may trigger false interruptions. Adjust your microphone sensitivity in Settings if needed.

Advanced features

Attach PDF documents

Click the 📎 Attach button
Select a PDF file from your computer
Review the confirmation dialog showing:
- Total tokens in the document
- Current context usage
- Whether it fits in the model’s context window
Click Inject to add it to the conversation

The AI can now answer questions about the document:

You: What are the main findings in section 3?

PDF text is extracted with PyMuPDF and injected directly into the conversation history. No vector database or RAG pipeline required.

Reading practice mode

Click the 📖 Practice button to enter shadowing coach mode:

Paste or type text you want to practice reading
Click Start
Read aloud while the app listens
Watch each word change color:
- Green = pronounced correctly
- Red = mispronounced
- Gray = not yet read
Click any word to hear its pronunciation via TTS
When finished, see your grade (A+ → F) and accuracy stats

Adjust settings

Click the ⚙️ Settings button to customize:

Language: Switch between English and Spanish (affects STT and TTS)
Voice speed: 0.5× to 2.0× playback speed
Font size: Adjust chat text size
Audio devices: Select input/output devices
Recording mode: Auto-send or manual control
Whisper model: Choose between base, small, medium, or large-v3

Changing the Whisper model requires a restart. The app will offer to restart immediately when you change this setting.

Keyboard shortcuts

Key	Action
Enter	Send message
Shift+Enter	New line in text input
Esc	(In Live Mode) Mute/unmute

Understanding the reasoning panel

When using thinking-capable models (like those with extended reasoning), you’ll see a ✨ Show Reasoning button below AI responses. Click to expand and see the model’s internal thought process:

Appears automatically when the model uses <think> tags
Shows reasoning in italic gray text
Collapsed by default to keep the UI clean

The reasoning panel only appears if your model supports structured thinking. Models without this capability will work normally without showing the panel.

Tips for better conversations

Start specific: Clear, specific prompts get better responses
- Good: “Explain quicksort in Python with examples”
- Poor: “Tell me about sorting”
Watch context usage: Keep an eye on the context donut
- Start a new chat when approaching 80-90%
- Long documents consume significant context
Use the right model: Smaller models are faster but less capable
- llama3.1:8b - Good all-rounder
- mistral:7b - Faster for simple tasks
- Larger models for complex reasoning
Optimize voice settings:
- Use base Whisper for speed, medium or large-v3 for accuracy
- Adjust voice speed if TTS is too fast/slow
- Test different voices to find your preference
Leverage markdown: The AI can format responses with:
- Code blocks with syntax highlighting
- Tables for structured data
- Headers, lists, and emphasis

Get Started

Core Features

Configuration

Advanced

Launch the application

Your first conversation

Explore conversation modes

Classic chat mode (default)

Live mode (hands-free)

Advanced features

Attach PDF documents

Reading practice mode

Adjust settings

Keyboard shortcuts

Understanding the reasoning panel

Tips for better conversations

Next steps

Configuration

Architecture

Build docs developers (and LLMs) love

Get Started

Core Features

Configuration

Advanced

​Launch the application

​Your first conversation

​Explore conversation modes

​Classic chat mode (default)

​Live mode (hands-free)

​Advanced features

​Attach PDF documents

​Reading practice mode

​Adjust settings

​Keyboard shortcuts

​Understanding the reasoning panel

​Tips for better conversations

​Next steps

Configuration

Architecture

Build docs developers (and LLMs) love

Launch the application

Your first conversation

Explore conversation modes

Classic chat mode (default)

Live mode (hands-free)

Advanced features

Attach PDF documents

Reading practice mode

Adjust settings

Keyboard shortcuts

Understanding the reasoning panel

Tips for better conversations

Next steps