Skip to main content

assistant_pipeline

Runs the complete assistant pipeline: transcribes audio, generates AI response, and synthesizes speech.
await invoke('run_assistant_pipeline', {
  apiKey: string,
  audioBase64: string,
  audioMimeType: string,
  // ... additional options
});

Request Parameters

apiKey
string
required
OpenAI-compatible API key for online STT/AI services
audioBase64
string
required
Base64-encoded audio data (recorded user speech)
audioMimeType
string
required
MIME type of the audio (e.g., audio/wav, audio/webm)
apiBaseUrl
string
API base URL (defaults to OpenAI)
sttModel
string
Speech-to-text model (e.g., whisper-1)
aiModel
string
AI chat model (e.g., gpt-4o-mini)
localMode
boolean
Use local mode for both STT and AI
sttLocalMode
boolean
Use local STT only
aiLocalMode
boolean
Use local AI only (Ollama)
localOllamaBaseUrl
string
Ollama base URL (defaults to http://localhost:11434)
localOllamaModel
string
Ollama model name (e.g., llama3.2:3b)
localSttModel
string
Local STT model (e.g., nvidia/parakeet-tdt_ctc-110m)
piperPath
string
Custom path to Piper TTS executable
language
string
Target language for transcription (e.g., en, es, fr)
allowedLanguages
string[]
List of allowed languages for multi-language detection
systemPrompt
string
Custom system prompt for AI assistant
temperature
number
AI response temperature (0.0-2.0)
maxTokens
number
Maximum tokens for AI response
dictionaryEntries
DictionaryEntry[]
Custom dictionary replacements
{
  source: string,  // Text to replace
  target: string   // Replacement text
}
snippetEntries
SnippetEntry[]
Text expansion snippets
{
  trigger: string,   // Trigger phrase
  expansion: string  // Expanded text
}
applyBacktrack
boolean
Enable backtracking correction for transcription
removeFillers
boolean
Remove filler words (um, uh, etc.)
autoPunctuation
boolean
Enable automatic punctuation
autoNumberedLists
boolean
Enable automatic numbered list detection
commandMode
boolean
Enable command mode (assistant conversation)
wakeWordEnabled
boolean
Enable wake word detection (e.g., “Hey Slasshy”)
assistantName
string
Custom assistant name for wake word
selectedText
string
Currently selected text for context-aware processing
ttsEngine
string
TTS engine: piper or coqui
piper
PiperPipelineRequest
Piper TTS configuration
{
  speed?: number,      // 0.5-2.0
  quality?: string,    // fast | balanced | high
  emotion?: string     // neutral | calm | happy | excited | serious | sad
}
coqui
CoquiPipelineRequest
Coqui TTS configuration
{
  pythonPath?: string,
  modelName?: string,
  language?: string,
  speakerId?: string,
  speed?: number,
  quality?: string,
  emotion?: string,
  useGpu?: boolean,
  splitSentences?: boolean
}

Response

mode
string
Pipeline mode: assistant or dictation
selectionRewrite
boolean
Whether selection text was rewritten
selectionPending
boolean
Whether a selection rewrite is pending confirmation
selectionContextCleared
boolean
Whether selection context was cleared
selectionContextUsed
boolean
Whether selection context was used in processing
transcript
string
Transcribed text from audio
assistantResponse
string
AI-generated response text
audioBase64
string
Base64-encoded TTS audio of the response
sttLatencyMs
number
Speech-to-text processing time (milliseconds)
aiLatencyMs
number
AI response generation time (milliseconds)
ttsLatencyMs
number
Text-to-speech synthesis time (milliseconds)
totalLatencyMs
number
Total pipeline processing time (milliseconds)

Pipeline Flow

The assistant pipeline executes in three stages:
1

Speech-to-Text (STT)

Audio is transcribed using either:
  • Online: OpenAI-compatible API (e.g., Whisper)
  • Local: Parakeet, Moonshine, or other local models
2

AI Processing

Transcript is processed by:
  • Online: OpenAI-compatible chat API (e.g., GPT-4)
  • Local: Ollama models (e.g., llama3.2:3b)
Applies:
  • System prompt customization
  • Dictionary replacements
  • Snippet expansions
  • Context awareness (selected text)
3

Text-to-Speech (TTS)

Response is synthesized using:
  • Piper: Fast, lightweight neural TTS
  • Coqui: Advanced multi-voice TTS

Hybrid Mode

You can mix online and local services:
// Local STT + Online AI
{
  sttLocalMode: true,
  aiLocalMode: false,
  localSttModel: 'nvidia/parakeet-tdt_ctc-110m',
  apiKey: 'sk-...',
  aiModel: 'gpt-4o-mini'
}

// Online STT + Local AI
{
  sttLocalMode: false,
  aiLocalMode: true,
  apiKey: 'sk-...',
  localOllamaModel: 'llama3.2:3b'
}

Example

const response = await invoke<AssistantPipelineResponse>('run_assistant_pipeline', {
  apiKey: 'sk-...',
  audioBase64: audioData,
  audioMimeType: 'audio/wav',
  commandMode: true,
  wakeWordEnabled: true,
  assistantName: 'Slasshy',
  aiModel: 'gpt-4o-mini',
  ttsEngine: 'piper',
  piper: {
    speed: 1.0,
    quality: 'balanced',
    emotion: 'neutral'
  }
});

console.log('Transcript:', response.transcript);
console.log('Response:', response.assistantResponse);
console.log('Total time:', response.totalLatencyMs + 'ms');

Build docs developers (and LLMs) love