Assistant Pipeline Commands

assistant_pipeline

Runs the complete assistant pipeline: transcribes audio, generates AI response, and synthesizes speech.

await invoke('run_assistant_pipeline', {
  apiKey: string,
  audioBase64: string,
  audioMimeType: string,
  // ... additional options
});

Request Parameters

apiKey

string

required

OpenAI-compatible API key for online STT/AI services

audioBase64

string

required

Base64-encoded audio data (recorded user speech)

audioMimeType

string

required

MIME type of the audio (e.g., audio/wav, audio/webm)

apiBaseUrl

string

API base URL (defaults to OpenAI)

sttModel

string

Speech-to-text model (e.g., whisper-1)

aiModel

string

AI chat model (e.g., gpt-4o-mini)

localMode

boolean

Use local mode for both STT and AI

sttLocalMode

boolean

Use local STT only

aiLocalMode

boolean

Use local AI only (Ollama)

localOllamaBaseUrl

string

Ollama base URL (defaults to http://localhost:11434)

localOllamaModel

string

Ollama model name (e.g., llama3.2:3b)

localSttModel

string

Local STT model (e.g., nvidia/parakeet-tdt_ctc-110m)

piperPath

string

Custom path to Piper TTS executable

language

string

Target language for transcription (e.g., en, es, fr)

allowedLanguages

string[]

List of allowed languages for multi-language detection

systemPrompt

string

Custom system prompt for AI assistant

temperature

number

AI response temperature (0.0-2.0)

maxTokens

number

Maximum tokens for AI response

dictionaryEntries

DictionaryEntry[]

Custom dictionary replacements

{
  source: string,  // Text to replace
  target: string   // Replacement text
}

snippetEntries

SnippetEntry[]

Text expansion snippets

{
  trigger: string,   // Trigger phrase
  expansion: string  // Expanded text
}

applyBacktrack

boolean

Enable backtracking correction for transcription

removeFillers

boolean

Remove filler words (um, uh, etc.)

autoPunctuation

boolean

Enable automatic punctuation

autoNumberedLists

boolean

Enable automatic numbered list detection

commandMode

boolean

Enable command mode (assistant conversation)

wakeWordEnabled

boolean

Enable wake word detection (e.g., “Hey Slasshy”)

assistantName

string

Custom assistant name for wake word

selectedText

string

Currently selected text for context-aware processing

ttsEngine

string

TTS engine: piper or coqui

piper

PiperPipelineRequest

Piper TTS configuration

{
  speed?: number,      // 0.5-2.0
  quality?: string,    // fast | balanced | high
  emotion?: string     // neutral | calm | happy | excited | serious | sad
}

coqui

CoquiPipelineRequest

Coqui TTS configuration

{
  pythonPath?: string,
  modelName?: string,
  language?: string,
  speakerId?: string,
  speed?: number,
  quality?: string,
  emotion?: string,
  useGpu?: boolean,
  splitSentences?: boolean
}

Response

mode

string

Pipeline mode: assistant or dictation

selectionRewrite

boolean

Whether selection text was rewritten

selectionPending

boolean

Whether a selection rewrite is pending confirmation

selectionContextCleared

boolean

Whether selection context was cleared

selectionContextUsed

boolean

Whether selection context was used in processing

transcript

string

Transcribed text from audio

assistantResponse

string

AI-generated response text

audioBase64

string

Base64-encoded TTS audio of the response

sttLatencyMs

number

Speech-to-text processing time (milliseconds)

aiLatencyMs

number

AI response generation time (milliseconds)

ttsLatencyMs

number

Text-to-speech synthesis time (milliseconds)

totalLatencyMs

number

Total pipeline processing time (milliseconds)

Pipeline Flow

The assistant pipeline executes in three stages:

Speech-to-Text (STT)

Audio is transcribed using either:

Online: OpenAI-compatible API (e.g., Whisper)
Local: Parakeet, Moonshine, or other local models

AI Processing

Transcript is processed by:

Online: OpenAI-compatible chat API (e.g., GPT-4)
Local: Ollama models (e.g., llama3.2:3b)

Applies:

System prompt customization
Dictionary replacements
Snippet expansions
Context awareness (selected text)

Text-to-Speech (TTS)

Response is synthesized using:

Piper: Fast, lightweight neural TTS
Coqui: Advanced multi-voice TTS

Hybrid Mode

You can mix online and local services:

// Local STT + Online AI
{
  sttLocalMode: true,
  aiLocalMode: false,
  localSttModel: 'nvidia/parakeet-tdt_ctc-110m',
  apiKey: 'sk-...',
  aiModel: 'gpt-4o-mini'
}

// Online STT + Local AI
{
  sttLocalMode: false,
  aiLocalMode: true,
  apiKey: 'sk-...',
  localOllamaModel: 'llama3.2:3b'
}

Example

const response = await invoke<AssistantPipelineResponse>('run_assistant_pipeline', {
  apiKey: 'sk-...',
  audioBase64: audioData,
  audioMimeType: 'audio/wav',
  commandMode: true,
  wakeWordEnabled: true,
  assistantName: 'Slasshy',
  aiModel: 'gpt-4o-mini',
  ttsEngine: 'piper',
  piper: {
    speed: 1.0,
    quality: 'balanced',
    emotion: 'neutral'
  }
});

console.log('Transcript:', response.transcript);
console.log('Response:', response.assistantResponse);
console.log('Total time:', response.totalLatencyMs + 'ms');

Architecture

Commands

Types

Assistant Pipeline Commands