assistant_pipeline
Runs the complete assistant pipeline: transcribes audio, generates AI response, and synthesizes speech.Request Parameters
OpenAI-compatible API key for online STT/AI services
Base64-encoded audio data (recorded user speech)
MIME type of the audio (e.g.,
audio/wav, audio/webm)API base URL (defaults to OpenAI)
Speech-to-text model (e.g.,
whisper-1)AI chat model (e.g.,
gpt-4o-mini)Use local mode for both STT and AI
Use local STT only
Use local AI only (Ollama)
Ollama base URL (defaults to
http://localhost:11434)Ollama model name (e.g.,
llama3.2:3b)Local STT model (e.g.,
nvidia/parakeet-tdt_ctc-110m)Custom path to Piper TTS executable
Target language for transcription (e.g.,
en, es, fr)List of allowed languages for multi-language detection
Custom system prompt for AI assistant
AI response temperature (0.0-2.0)
Maximum tokens for AI response
Custom dictionary replacements
Text expansion snippets
Enable backtracking correction for transcription
Remove filler words (um, uh, etc.)
Enable automatic punctuation
Enable automatic numbered list detection
Enable command mode (assistant conversation)
Enable wake word detection (e.g., “Hey Slasshy”)
Custom assistant name for wake word
Currently selected text for context-aware processing
TTS engine:
piper or coquiPiper TTS configuration
Coqui TTS configuration
Response
Pipeline mode:
assistant or dictationWhether selection text was rewritten
Whether a selection rewrite is pending confirmation
Whether selection context was cleared
Whether selection context was used in processing
Transcribed text from audio
AI-generated response text
Base64-encoded TTS audio of the response
Speech-to-text processing time (milliseconds)
AI response generation time (milliseconds)
Text-to-speech synthesis time (milliseconds)
Total pipeline processing time (milliseconds)
Pipeline Flow
The assistant pipeline executes in three stages:Speech-to-Text (STT)
Audio is transcribed using either:
- Online: OpenAI-compatible API (e.g., Whisper)
- Local: Parakeet, Moonshine, or other local models
AI Processing
Transcript is processed by:
- Online: OpenAI-compatible chat API (e.g., GPT-4)
- Local: Ollama models (e.g., llama3.2:3b)
- System prompt customization
- Dictionary replacements
- Snippet expansions
- Context awareness (selected text)