Overview
The voice pipeline provides three modes:
- Live mode: Continuous mic → STT → LLM → TTS → speaker loop
- Push-to-talk: Capture on demand, transcribe with offline Whisper
- Text mode: Direct LLM processing without STT
Live Voice Pipeline (macOS)
rcli_start_listening
Start the live voice pipeline: microphone → STT → LLM → TTS → speaker.
int rcli_start_listening(RCLIHandle handle);
Engine handle (must be initialized)
0: Successfully started
-1: Failed (engine not ready or already listening)
Example
if (rcli_start_listening(handle) == 0) {
printf("Listening... Speak now.\n");
} else {
fprintf(stderr, "Failed to start listening\n");
}
The pipeline runs asynchronously on background threads. Use callbacks to receive transcript updates and state changes.
rcli_stop_listening
Stop the live voice pipeline.
int rcli_stop_listening(RCLIHandle handle);
Example
// Start listening
rcli_start_listening(handle);
// ... user speaks ...
// Stop after 30 seconds
sleep(30);
rcli_stop_listening(handle);
Live Mode Pipeline Flow
1. Mic input captured at 16kHz
2. VAD detects speech start/end
3. Zipformer streaming STT transcribes in real-time
4. On speech endpoint:
- Transcript sent to LLM
- LLM generates response (with optional tool calls)
- Response sent to TTS
5. TTS audio plays through speaker
6. Loop continues until rcli_stop_listening()
Push-to-Talk Mode
Capture audio on demand, then transcribe with high-accuracy offline Whisper.
rcli_start_capture
Start microphone capture without STT streaming.
int rcli_start_capture(RCLIHandle handle);
Engine handle (must be initialized)
0: Capture started
-1: Failed
Call this before the user starts speaking to avoid clipping the start of their audio.
rcli_stop_capture_and_transcribe
Stop capture and transcribe the recorded audio.
const char* rcli_stop_capture_and_transcribe(RCLIHandle handle);
Transcript text. Empty string if transcription failed. Do not free - owned by the engine.
Example: Push-to-Talk
// User presses button
rcli_start_capture(handle);
printf("Recording... Release to transcribe.\n");
// User speaks while holding button
// ...
// User releases button
const char* transcript = rcli_stop_capture_and_transcribe(handle);
printf("You said: %s\n", transcript);
// Now process the command
const char* response = rcli_process_command(handle, transcript);
rcli_speak(handle, response);
Transcription uses offline Whisper and can take 100-500ms depending on audio length.
rcli_get_transcript
Get the last transcript from STT.
const char* rcli_get_transcript(RCLIHandle handle);
Last transcript text. Empty if none available. Do not free.
Text Command Processing
rcli_process_command
Process a text command: skip STT, go directly to LLM → tool execution → response.
const char* rcli_process_command(RCLIHandle handle, const char* text);
Engine handle (must be initialized)
LLM response text. Empty string on error. Do not free - owned by the engine.
Example: Text-Only Mode
const char* response = rcli_process_command(handle, "What's the weather?");
printf("Assistant: %s\n", response);
// LLM detects tool call and executes it
const char* result = rcli_process_command(handle, "open Safari");
printf("%s\n", result); // "Opened Safari"
This function blocks until the LLM completes generation. Use rcli_stop_processing() from another thread to cancel.
Conversation History
The engine automatically maintains conversation context:
rcli_process_command(handle, "My name is Alice");
// "Nice to meet you, Alice!"
rcli_process_command(handle, "What's my name?");
// "Your name is Alice."
// Clear history to start fresh
rcli_clear_history(handle);
rcli_process_command(handle, "What's my name?");
// "I don't know your name."
rcli_clear_history
Clear conversation history (start a fresh conversation).
void rcli_clear_history(RCLIHandle handle);
Text-to-Speech
rcli_speak
Synthesize text to speech and play through speaker.
int rcli_speak(RCLIHandle handle, const char* text);
Engine handle (must be initialized)
Text to speak. Markdown and special characters are automatically sanitized.
0: Speech started successfully
-1: Failed (TTS error or invalid handle)
Example
rcli_speak(handle, "Hello, world!");
// Speak LLM response
const char* response = rcli_process_command(handle, "Tell me a joke");
rcli_speak(handle, response);
rcli_speak() is non-blocking - it starts playback and returns immediately. Use rcli_is_speaking() to check status.
rcli_stop_speaking
Stop TTS playback immediately (interrupt current speech).
void rcli_stop_speaking(RCLIHandle handle);
Example
rcli_speak(handle, "This is a very long sentence that...");
// User interrupts
rcli_stop_speaking(handle);
rcli_is_speaking
Check if TTS audio is currently playing.
int rcli_is_speaking(RCLIHandle handle);
1: TTS is playing
0: No active TTS playback
Example: Wait for Speech to Complete
rcli_speak(handle, "Processing your request...");
while (rcli_is_speaking(handle)) {
usleep(100000); // Poll every 100ms
}
printf("Speech finished\n");
Emergency Stop
rcli_stop_processing
Stop all ongoing processing: cancel LLM generation, stop TTS, stop STT.
void rcli_stop_processing(RCLIHandle handle);
Safe to call from any thread. Non-blocking. Use this for “panic button” functionality.
Example: Interrupt Long Generation
// Thread 1: Start long operation
const char* result = rcli_process_command(handle, "Write a 1000 word essay");
// Thread 2: User cancels
rcli_stop_processing(handle);
Complete Example: Voice Assistant
#include "api/rcli_api.h"
#include <stdio.h>
#include <unistd.h>
void on_transcript(const char* text, int is_final, void* user_data) {
if (is_final) {
printf("User: %s\n", text);
} else {
printf("Partial: %s\r", text); // Real-time updates
fflush(stdout);
}
}
void on_state(int old_state, int new_state, void* user_data) {
const char* states[] = {"IDLE", "LISTENING", "PROCESSING", "SPEAKING", "INTERRUPTED"};
printf("State: %s -> %s\n", states[old_state], states[new_state]);
}
int main() {
RCLIHandle handle = rcli_create(NULL);
rcli_init(handle, "/path/to/models", 99);
// Register callbacks
rcli_set_transcript_callback(handle, on_transcript, NULL);
rcli_set_state_callback(handle, on_state, NULL);
// Start voice pipeline
printf("Starting voice assistant...\n");
rcli_start_listening(handle);
// Run for 60 seconds
sleep(60);
// Stop
rcli_stop_listening(handle);
rcli_destroy(handle);
return 0;
}
See Also