Prerequisites
Before installing ChatbotAI-Free, ensure you have the following:- Python 3.10 or 3.11 (Python 3.12+ not yet tested)
- Ollama installed and running — ollama.ai
- Git for cloning the repository
- (Optional) NVIDIA GPU with CUDA for faster inference
ChatbotAI-Free runs on CPU, but GPU acceleration significantly improves response times for both LLM inference and voice synthesis.
Installation steps
Create a virtual environment
Create and activate a Python virtual environment:
Using a virtual environment keeps your dependencies isolated and prevents conflicts with other Python projects.
Install Python dependencies
Install the required packages from The key dependencies include:
requirements.txt:- PyQt6 (6.6.0+) - Modern GUI framework
- faster-whisper (0.10.0+) - Real-time speech recognition
- ollama (0.1.0+) - LLM inference client
- kokoro-onnx (0.4.0+) - Neural TTS engine
- PyMuPDF (1.23.0+) - PDF text extraction
- tiktoken (0.5.0+) - Token counting
- sounddevice & numpy - Audio I/O
(Optional) Enable GPU acceleration
If you have an NVIDIA GPU with CUDA, install the GPU-accelerated ONNX runtime:This will significantly speed up:
- Kokoro TTS synthesis (300MB model runs much faster)
- Whisper transcription (especially medium/large models)
The app automatically detects CUDA availability and uses GPU acceleration if available. If CUDA is not found, it falls back to CPU with no code changes needed.
Download Kokoro voice models
Kokoro v1.0 powers all built-in English and Spanish voices (54 voices total). The model files are too large for GitHub, so download them manually:
- Go to the kokoro-onnx releases page
- Download
kokoro-v1.0.onnx(~300 MB) - Download
voices-v1.0.bin(~27 MB) - Place both files in
voices/kokoro-v1.0/:
On first launch, the voice scanner checks the
voices/ folder. If the Kokoro files are in place, you’re ready to go immediately.Pull an Ollama model
Download at least one LLM model to power conversations. For example:Recommended models:
llama3.1:8b- Good balance of speed and quality (8B parameters)mistral:7b- Fast and efficient (7B parameters)gemma2:9b- Google’s Gemma model (9B parameters)qwen2.5:7b- Excellent multilingual support
ChatbotAI-Free automatically detects all available Ollama models. You can switch between them in the UI dropdown.
The lightest available model is automatically used to generate chat titles. Having a small model like
llama3.1:8b or smaller ensures fast title generation.Optional: Add more voices
Want voices in other languages beyond English and Spanish? You can add any Piper-compatible Sherpa-ONNX voice pack.Download a voice pack
Browse available voices at huggingface.co/csukuangfj. Download these files from the repo:
- The
.onnxmodel file tokens.txt- The
espeak-ng-data/directory
Restart and classify
On the next launch, the voice scanner detects the new folder and shows a dialog asking which language to assign. After confirmation, the voice appears in the voice selector dropdown.
The app identifies Sherpa packs by the presence of a
.onnx file and an espeak-ng-data/ subdirectory.Troubleshooting
Whisper model fails to load
If you see errors about missing CUDA or compute type:- Ensure you have PyTorch installed:
pip install torch - The app automatically falls back to CPU if CUDA is unavailable
Ollama connection errors
If the app can’t connect to Ollama:- Verify Ollama is running:
ollama list - Check that the Ollama service is active:
ollama serve
No audio output
If TTS generates silence:- Verify Kokoro model files are in the correct location
- Check audio device settings in the app’s Settings panel
- On Linux, ensure PipeWire or PulseAudio is running
Next steps
Quick start
Learn how to start your first conversation